AWS Lambda is a serverless compute service that runs code in response to events without provisioning or managing servers. It solves the operational overhead problem—no patching, no capacity planning, no idle-resource cost—by executing functions on demand and billing only for actual execution time (millisecond granularity). A team picks Lambda for event-driven workloads, APIs with variable traffic, background processing, and glue code in data pipelines. The broader "serverless" paradigm extends this to managed databases (DynamoDB, Aurora Serverless), queues (SQS), storage (S3), and orchestration (Step Functions), enabling entire applications that scale to zero and to millions of requests with no infrastructure management.
Lambda runs inside Firecracker microVMs—lightweight VMs that boot in ~125ms and provide strong hardware-level isolation between tenants. Each Lambda function invocation runs in its own execution environment (a sandboxed container-like unit) that goes through this lifecycle:
[Event Source] --> [Lambda Service] --> [Find/Create Execution Environment]
|
[Cold Start: Download code,
start runtime, run init code]
|
[Warm Start: Reuse env,
skip init, jump to handler]
|
[Handler runs]
|
[Response returned to caller]
|
[Env kept warm ~5-15 min idle]
Cold Start anatomy:
Typical cold start durations (p99):
Warm Start: The execution environment is reused for subsequent invocations. Init code does not re-run. The handler receives a fresh event but global state (DB connections, SDK clients, caches) persists.
Key implication: Initialize SDK clients, establish DB connections, and load large configuration objects outside the handler so they are reused across warm invocations.
// CORRECT: initialization outside handler
const { DynamoDBClient } = require('@aws-sdk/client-dynamodb');
const { SSMClient, GetParameterCommand } = require('@aws-sdk/client-ssm');
const ddb = new DynamoDBClient({ region: 'us-east-1' }); // created once per container
let cachedConfig = null;
async function getConfig() {
if (!cachedConfig) {
const ssm = new SSMClient({ region: 'us-east-1' });
const res = await ssm.send(new GetParameterCommand({ Name: '/myapp/config', WithDecryption: true }));
cachedConfig = JSON.parse(res.Parameter.Value);
}
return cachedConfig;
}
exports.handler = async (event) => {
const config = await getConfig(); // SSM call skipped on warm starts
// ...handler logic using ddb and config
};
Lambda concurrency is per-function, per-region. Each concurrent execution requires its own environment.
Account concurrency limit (e.g., 1000)
├── Function A: reserved = 200 (guaranteed 200, up to 200)
├── Function B: reserved = 100 (guaranteed 100, up to 100)
└── Unreserved pool: 700 shared among all other functions
Provisioned Concurrency is the right answer for latency-sensitive APIs where cold start spikes are unacceptable (e.g., real-time inference endpoints, checkout flows). Cost: you pay for provisioned-concurrency-hours even when no invocations occur.
Lambda does not expose CPU as a separate knob. Memory allocation directly controls CPU allocation in a linear relationship:
Increasing memory from 512 MB to 1,769 MB doubles CPU. For CPU-bound workloads (JSON parsing, image processing, ML inference), doubling memory can halve duration, often reducing cost despite higher per-ms rate. AWS Lambda Power Tuning automates this optimization.
| Dimension | ZIP Package | Container Image |
|---|---|---|
| Max size | 50 MB zipped / 250 MB unzipped | 10 GB |
| Supported runtimes | Managed runtimes (Node 18/20, Python 3.12, Java 17/21, etc.) | Any (custom base image) |
| Cold start | Generally faster | Slower for large images; mitigated by Lambda's image caching |
| Build tooling | zip, SAM, CDK | Docker |
| Best for | Standard runtimes, small deps | ML models, large binaries, custom runtimes |
Container image cold starts are optimized through Lambda's container image caching—images are cached at the AZ level after first use. Use Lambda-provided base images (public.ecr.aws/lambda/python:3.12) for best caching performance; they are pre-cached at Lambda's infrastructure level.
A Lambda Layer is a ZIP archive published separately and shared across functions. Layers are mounted at /opt/ in the execution environment.
numpy, pandas for data functions), common utilities, proprietary runtimes# SAM template using layers
Globals:
Function:
Layers:
- !Ref CommonUtilsLayer
- arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension:38
Resources:
CommonUtilsLayer:
Type: AWS::Serverless::LayerVersion
Properties:
ContentUri: layers/common-utils/
CompatibleRuntimes: [nodejs20.x]
| Limit | Value |
|---|---|
| Compressed ZIP (direct upload) | 50 MB |
| Uncompressed ZIP (from S3) | 250 MB |
/tmp ephemeral storage | 512 MB – 10,240 MB (configurable) |
| Container image | 10 GB |
| Environment variables | 4 KB total |
| Concurrent executions per account | 1,000 (default, soft limit) |
| Function timeout | 15 minutes max |
REST API (v1): Full-featured, supports usage plans, API keys, request/response mapping templates, caching. Higher cost.
HTTP API (v2): Lower latency (~60% cheaper), simpler configuration, supports JWT authorizers natively, ideal for most Lambda backends.
// HTTP API event shape (v2)
// event.requestContext.http.method, event.rawPath, event.body (string)
exports.handler = async (event) => {
const body = JSON.parse(event.body ?? '{}');
return {
statusCode: 200,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: 'ok', received: body }),
};
};
Lambda Function URLs (no API Gateway): Simple HTTPS endpoint per function, supports IAM or no auth. Cheaper, no API Gateway features. Good for webhooks, simple backends.
See Messaging & Events for SQS, SNS, EventBridge, and Kinesis architecture details.
Lambda polls SQS via a event source mapping (ESM). The ESM long-polls the queue, batches messages, and invokes Lambda. Key behaviors:
batchItemFailures in response to requeue only failed messages, not the entire batchexports.handler = async (event) => {
const failures = [];
for (const record of event.Records) {
try {
const body = JSON.parse(record.body);
await processMessage(body);
} catch (err) {
console.error('Failed to process:', record.messageId, err);
failures.push({ itemIdentifier: record.messageId });
}
}
return { batchItemFailures: failures }; // only failed messages become visible again
};
SNS pushes directly to Lambda (no polling). Lambda is an SNS subscription. Invocation is asynchronous—Lambda returns 202 immediately; SNS retries on failure per its retry policy. No batch size concept; each SNS message triggers one invocation. Use Lambda DLQ for failed async invocations.
S3 event notifications push to Lambda asynchronously when objects are created, deleted, etc. Critical: S3 notifications are at-least-once—handle idempotency. Common pattern: use the S3 object key as an idempotency key against DynamoDB.
exports.handler = async (event) => {
for (const record of event.Records) {
const bucket = record.s3.bucket.name;
const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, ' '));
// Process object at s3://bucket/key
}
};
Lambda polls the stream via ESM. Receives ordered, batched change records (INSERT, MODIFY, REMOVE) with before/after images. Used for: cache invalidation, replication, triggering downstream workflows.
bisectBatchOnFunctionError, max retry attempts, and DLQ on the ESM to handle poison-pill records.Rule-based routing of events from AWS services, SaaS (via EventBridge partner integrations), or custom events. Lambda is a target. Key for event-driven architectures:
rate(5 minutes))// EventBridge rule pattern: trigger Lambda on EC2 instance state change
{
"source": ["aws.ec2"],
"detail-type": ["EC2 Instance State-change Notification"],
"detail": { "state": ["terminated"] }
}
Similar to DynamoDB Streams ESM. Lambda polls shards. Ordered within a shard. Enhanced fan-out allows multiple consumers at 2 MB/s per shard each (vs. shared 2 MB/s per stream for standard). Same failure handling patterns apply—blocking retry per shard until resolved.
Cognito User Pools triggers Lambda at specific points in the auth lifecycle:
Thin handler pattern: Handler does minimal work—parse input, validate, delegate to a service layer, return result. Keeps handler testable.
// service.js — pure business logic, no Lambda dependency
async function processOrder(orderId, userId) { /* ... */ }
// handler.js — thin shell
const { processOrder } = require('./service');
exports.handler = async (event) => {
const { orderId, userId } = JSON.parse(event.body);
if (!orderId || !userId) return { statusCode: 400, body: 'Missing fields' };
const result = await processOrder(orderId, userId);
return { statusCode: 200, body: JSON.stringify(result) };
};
Middy middleware (Node.js): Declarative middleware pipeline for cross-cutting concerns.
const middy = require('@middy/core');
const httpJsonBodyParser = require('@middy/http-json-body-parser');
const httpErrorHandler = require('@middy/http-error-handler');
const { ssm } = require('@middy/ssm');
const baseHandler = async (event) => {
const config = event.ssm; // injected by middleware
return { statusCode: 200, body: JSON.stringify({ result: 'ok' }) };
};
exports.handler = middy(baseHandler)
.use(httpJsonBodyParser())
.use(ssm({ fetchData: { config: '/myapp/config' }, cacheExpiry: 60_000 }))
.use(httpErrorHandler());
| Approach | Use Case | Rotation | Encryption | Cost |
|---|---|---|---|---|
| Env vars | Non-sensitive config, feature flags | Manual redeploy | KMS optional | Free |
| SSM Parameter Store Standard | Config, non-secret params | Manual | KMS optional | Free (4K params) |
| SSM Parameter Store Advanced | Large params (>4KB), history | Manual | KMS | $0.05/param/month |
| Secrets Manager | DB passwords, API keys | Automatic | KMS mandatory | $0.40/secret/month |
Best practice: Use Secrets Manager for credentials that should rotate (RDS passwords, API keys for third-party services). Use Parameter Store for configuration that is sensitive but static. Use environment variables only for non-sensitive configuration that changes with deployments.
AWS Parameters and Secrets Lambda Extension caches Parameter Store and Secrets Manager values locally, eliminating the network call on every warm invocation:
// With the extension installed as a layer, fetch via localhost
const response = await fetch(
`http://localhost:2773/systemsmanager/parameters/get?name=%2Fmyapp%2Fconfig`,
{ headers: { 'X-Aws-Parameters-Secrets-Token': process.env.AWS_SESSION_TOKEN } }
);
const { Parameter } = await response.json();
Synchronous invocations (API Gateway, Lambda Function URL): The caller receives the error synchronously. No automatic Lambda-level retry. The caller decides to retry.
Asynchronous invocations (S3, SNS, EventBridge): Lambda retries up to 2 additional times with exponential backoff (1 min, then 2 min). After all retries fail, the event is discarded or sent to the function's DLQ (SQS queue or SNS topic).
// terraform / cloudformation: configure function-level DLQ for async invocations
// aws cloudformation:
// DeadLetterConfig:
// TargetArn: !GetAtt FailedEventsDLQ.Arn
// Proper error handling in handler
exports.handler = async (event) => {
try {
await riskyOperation(event);
} catch (err) {
if (isRetryable(err)) throw err; // rethrow → Lambda retries
await logToDeadLetterStore(event, err); // non-retryable: store and swallow
return { statusCode: 200 }; // ack to avoid pointless retries
}
};
ESM (SQS, Kinesis, DynamoDB Streams) failure behavior:
BisectBatchOnFunctionError, MaximumRetryAttempts, and ESM-level DLQ)Open-source Step Functions state machine (AWS published) that tests a function at multiple memory configurations, measuring duration and cost, and recommends the optimal setting.
# Deploy via SAR (Serverless Application Repository)
# Then invoke the state machine with:
{
"lambdaARN": "arn:aws:lambda:us-east-1:123:function:my-function",
"powerValues": [128, 256, 512, 1024, 1769, 3008],
"num": 10,
"payload": { "test": true },
"parallelInvocation": true,
"strategy": "cost" // or "speed" or "balanced"
}
The tool generates a visualization URL with cost/duration curves. The "balanced" strategy finds the knee of the curve—the point of diminishing returns on performance per dollar. Always run power tuning before setting memory on production functions.
One event triggers multiple parallel Lambda executions. Implementations:
[S3 Upload Event]
|
[Lambda: Router]
|
┌───┼───┐
▼ ▼ ▼
[Thumb] [OCR] [Virus Scan] ← parallel Lambda invocations
└───┬───┘
▼
[Step Functions: WaitForTaskToken to aggregate]
Manages distributed transactions across multiple services without a global transaction coordinator. Each step publishes an event on success or triggers a compensating transaction on failure.
Choreography (EventBridge): Services react to events independently. Decoupled but harder to trace. Orchestration (Step Functions): A central orchestrator (state machine) calls each service and handles failures with explicit compensation steps. Easier to understand the full flow.
Order Saga (Step Functions orchestration):
[Reserve Inventory] → success → [Charge Payment] → success → [Send Confirmation]
↓ fail ↓ fail
[No compensation needed] [Release Inventory (compensate)]
Instead of storing current state, store all events that led to that state. The current state is derived by replaying events.
Separate the write model (commands) from the read model (queries). With Lambda:
[POST /orders] [GET /orders/summary]
↓ ↓
[Write Lambda] [Read Lambda]
↓ ↓
[DynamoDB table] [DynamoDB GSI / OpenSearch]
↓
[DynamoDB Streams → Sync Lambda → OpenSearch]
See VPC & Networking for subnets, routing tables, security groups, and VPC Endpoints in depth.
Attach Lambda to a VPC only when the function needs to access VPC-private resources:
Do not attach to VPC if the function only calls public AWS services (DynamoDB, S3, SSM, Secrets Manager)—use VPC Endpoints instead and avoid the cold start penalty.
When Lambda is VPC-attached, it must provision an Elastic Network Interface (ENI) in your VPC. Historically this added 10–15 seconds to cold starts. As of 2020, AWS overhauled this:
VPC-attached Lambda functions in private subnets cannot reach the internet by default. For internet access:
NAT Gateway cost: $0.045/hour + $0.045/GB data processed. For high-throughput Lambda functions calling external APIs, NAT Gateway costs can dominate. Consider: PrivateLink, VPC endpoints, or restructuring to avoid NAT.
Private Subnet (Lambda)
↓
[Route table: 0.0.0.0/0 → NAT GW]
↓
NAT Gateway (Public Subnet, Elastic IP)
↓
Internet Gateway
↓
Internet
Lambda automatically sends stdout/stderr to CloudWatch Logs. Each function has a log group (/aws/lambda/<function-name>). Each execution environment writes to its own log stream.
Structured logging is essential for querying with CloudWatch Insights:
// Use structured JSON logging instead of console.log strings
const logger = {
info: (msg, meta = {}) => console.log(JSON.stringify({
level: 'INFO', message: msg, timestamp: new Date().toISOString(), ...meta
})),
error: (msg, meta = {}) => console.error(JSON.stringify({
level: 'ERROR', message: msg, timestamp: new Date().toISOString(), ...meta
})),
};
exports.handler = async (event) => {
const requestId = event.requestContext?.requestId;
logger.info('Handler invoked', { requestId, path: event.rawPath });
try {
const result = await processEvent(event);
logger.info('Success', { requestId, resultId: result.id });
return { statusCode: 200, body: JSON.stringify(result) };
} catch (err) {
logger.error('Handler failed', { requestId, error: err.message, stack: err.stack });
throw err;
}
};
CloudWatch Logs Insights query examples:
-- Find slowest invocations
fields @timestamp, @duration, @memoryUsed
| filter @type = "REPORT"
| sort @duration desc
| limit 20
-- Count errors by function
fields @timestamp
| filter @message like /ERROR/
| stats count(*) as errorCount by bin(5m)
Log retention: Set explicitly (default: never expire). Unchecked retention is a cost trap. Typical: 30–90 days for production, 7 days for dev.
A CloudWatch agent extension layer that collects enhanced system-level metrics: CPU utilization, memory utilization, disk I/O, network, cold start counts. Enables dashboards not available in default Lambda metrics. Enable via the LambdaInsightsExtension layer.
Distributed tracing for Lambda. Captures:
const AWSXRay = require('aws-xray-sdk-core');
const AWS = AWSXRay.captureAWS(require('aws-sdk')); // all SDK calls traced
exports.handler = async (event) => {
const segment = AWSXRay.getSegment();
const subsegment = segment.addNewSubsegment('custom-db-operation');
try {
await runQuery();
subsegment.close();
} catch (err) {
subsegment.addError(err);
subsegment.close();
throw err;
}
};
Enable tracing in SAM/CloudFormation:
Globals:
Function:
Tracing: Active # PassThrough = no sampling, Active = use sampling rules
Sampling: By default, X-Ray samples 5% of requests (minimum 1 req/sec). Configure custom sampling rules in the X-Ray console. High-volume production functions should use sampling to control costs.
| Metric | What It Signals |
|---|---|
Errors | Unhandled exceptions / handler threw |
Throttles | Concurrency limit hit; scale reserved limit or request quota increase |
Duration (p50, p99) | Performance; p99 spikes indicate cold starts or slow dependencies |
ConcurrentExecutions | Approaching account limit; plan for reserved concurrency |
IteratorAge (Streams) | Consumer lag; ESM can't keep up with stream throughput |
DeadLetterErrors | Failed to write to DLQ; permissions or DLQ capacity issue |
Lambda pricing has two components:
Duration cost formula:
Cost = (invocations × duration_seconds × memory_GB) × price_per_GB_second
For 1M invocations at 512 MB, 200ms average duration:
The free tier covers 1M requests/month and 400,000 GB-seconds/month forever (not just first year).
Switch from x86_64 to arm64 architecture:
bcrypt, sharp) for ARM# SAM: switch to arm64
Properties:
Architectures: [arm64]
Runtime: nodejs20.x
Real-world savings: Companies have reported 30–40% cost reduction by switching to ARM64 after factoring in both price reduction and performance improvement.
# Auto Scaling provisioned concurrency by schedule
Resources:
FunctionAliasProdScalableTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
ServiceNamespace: lambda
ResourceId: !Sub "function:${MyFunction}:prod"
ScalableDimension: lambda:function:ProvisionedConcurrency
MinCapacity: 2
MaxCapacity: 100
ScaleOutSchedule:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyType: TargetTrackingScaling
TargetTrackingScalingPolicyConfiguration:
TargetValue: 0.7 # scale when 70% of provisioned concurrency utilized
PredefinedMetricSpecification:
PredefinedMetricType: LambdaProvisionedConcurrencyUtilization
Both run code at CloudFront edge locations but differ significantly.
| Dimension | Lambda@Edge | CloudFront Functions |
|---|---|---|
| Runtime | Node.js 14/18, Python 3.9 | JavaScript (ES5.1, restricted) |
| Max execution time | 5s (viewer), 30s (origin) | 1ms |
| Memory | 128 MB–10 GB | 2 MB |
| Max package size | 1 MB (viewer), 50 MB (origin) | 10 KB |
| Network access | Yes (external HTTP calls) | No |
| Cost | $0.60/M requests + duration | $0.10/M invocations |
| Triggers | Viewer request/response, origin request/response | Viewer request/response only |
| Use cases | Auth, A/B testing, URL rewriting, server-side rendering, dynamic content | Simple rewrites, header manipulation, query string normalization |
Lambda@Edge is deployed to us-east-1 and replicated to edge locations automatically. It cannot access VPC resources.
CloudFront Functions are ultra-lightweight and designed for simple, sub-millisecond request/response manipulation. They are ~6× cheaper than Lambda@Edge for viewer request/response events.
// CloudFront Function: add security headers (viewer response trigger)
function handler(event) {
var response = event.response;
var headers = response.headers;
headers['strict-transport-security'] = { value: 'max-age=63072000; includeSubdomains; preload' };
headers['x-content-type-options'] = { value: 'nosniff' };
headers['x-frame-options'] = { value: 'DENY' };
return response;
}
// Lambda@Edge: JWT verification (viewer request trigger)
// Can make external calls (e.g., JWKS endpoint), use full Node.js crypto
exports.handler = async (event) => {
const request = event.Records[0].cf.request;
const token = request.headers['authorization']?.[0]?.value?.replace('Bearer ', '');
if (!token || !(await verifyJwt(token))) {
return { status: '401', body: 'Unauthorized' };
}
return request; // continue to origin
};
Decision rule:
Orchestration (Step Functions): A central state machine explicitly controls the flow—calls services, handles errors, manages retries, branches conditionally. The workflow is visible and auditable.
Choreography (EventBridge/SNS): Services react to events from other services. No central coordinator. Highly decoupled but flow is implicit and harder to trace.
Choose orchestration when:
Choose choreography when:
Standard Workflows: Exactly-once execution, durable execution history (90 days), 1 year max duration. Best for business-critical workflows.
Express Workflows: At-least-once, no execution history in console (use CloudWatch), 5 min max duration, 100K executions/sec. Best for high-volume event processing.
{
"Comment": "Order Processing Workflow",
"StartAt": "ValidateOrder",
"States": {
"ValidateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:ValidateOrder",
"Retry": [{ "ErrorEquals": ["Lambda.ServiceException"], "MaxAttempts": 3 }],
"Catch": [{ "ErrorEquals": ["ValidationError"], "Next": "HandleInvalidOrder" }],
"Next": "ProcessPayment"
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"Parameters": {
"FunctionName": "ProcessPayment",
"Payload": {
"orderId.$": "$.orderId",
"taskToken.$": "$$.Task.Token"
}
},
"HeartbeatSeconds": 3600,
"Next": "FulfillOrder"
},
"FulfillOrder": { "Type": "Task", "Resource": "...", "End": true },
"HandleInvalidOrder": { "Type": "Fail", "Error": "InvalidOrder" }
}
}
WaitForTaskToken pattern: Lambda initiates an async operation (e.g., human approval, third-party payment) and returns immediately. Step Functions pauses and resumes when the callback SendTaskSuccess or SendTaskFailure is called with the token.
SDK Integrations (optimistic integrations): Step Functions can call DynamoDB, SQS, SNS, ECS, Glue, etc. directly without Lambda as a wrapper—reducing latency and cost.
Q: What is a Lambda cold start and how do you reduce it? A: A cold start occurs when Lambda must provision a new execution environment—boot the Firecracker microVM, initialize the runtime, and run the init code outside the handler. It adds latency to the first invocation on a new or idle environment. Reduction strategies: (1) choose a fast-starting runtime (Node.js, Python over Java); (2) minimize deployment package size—fewer modules = faster require/import; (3) move all SDK initialization outside the handler; (4) use Provisioned Concurrency for latency-sensitive functions; (5) for containers, use Lambda base images; (6) reduce init code complexity—lazy-load what you can.
Q: How does Lambda concurrency work, and what happens when you hit the limit? A: Lambda concurrency is the number of execution environments running simultaneously. The account has a regional limit (default 1,000). When the limit is hit, new invocations are throttled—for synchronous callers (API Gateway), they receive a 429 error. For async invocations, Lambda retries. Mitigation: request a limit increase, use reserved concurrency to protect critical functions, implement exponential backoff in callers, and use SQS as a buffer to absorb traffic spikes without throttling.
Q: Explain the difference between SQS Standard and FIFO queues for Lambda triggers. A: Standard queues offer unlimited throughput and at-least-once delivery with best-effort ordering. Lambda processes batches from multiple shards in parallel. FIFO queues guarantee exactly-once processing and strict ordering within a message group. Lambda's ESM for FIFO scales to at most one batch per message group ID concurrently, preserving order. Use Standard when ordering doesn't matter and throughput is paramount. Use FIFO when processing order matters (e.g., financial transactions, state transitions) and throughput requirements are moderate (3,000 messages/sec per FIFO queue with batching).
Q: When would you use Step Functions instead of just chaining Lambda functions directly? A: Direct chaining (Lambda calls Lambda) is an anti-pattern for complex workflows: errors are hard to handle, state must be passed through all calls, and there's no visibility into where an execution failed. Use Step Functions when: (1) you need retry logic and error handling per step; (2) the workflow has branches or parallel paths; (3) you need audit trail/execution history; (4) long-running workflows exceed Lambda's 15-min timeout; (5) you need human approval steps (WaitForTaskToken); (6) you want to call AWS services without Lambda intermediaries using SDK integrations. See Step Functions for full orchestration coverage.
Q: How do you handle idempotency in Lambda?
A: Lambda can be invoked more than once for the same event (async retries, at-least-once triggers like S3/SNS). Idempotent handlers produce the same outcome regardless of how many times they execute. Implementation: (1) use a unique event/request ID as an idempotency key; (2) store processed IDs in DynamoDB with a TTL equal to the retry window; (3) check if the key exists before processing; (4) use DynamoDB conditional writes to atomically "claim" processing. The AWS Lambda Powertools library (@aws-lambda-powertools/idempotency) provides a decorator pattern that handles this automatically.
// AWS Lambda Powertools idempotency (Node.js)
const { makeIdempotent } = require('@aws-lambda-powertools/idempotency');
const { DynamoDBPersistenceLayer } = require('@aws-lambda-powertools/idempotency/dynamodb');
const persistenceStore = new DynamoDBPersistenceLayer({ tableName: 'IdempotencyTable' });
exports.handler = makeIdempotent(
async (event) => {
// this body runs exactly once per unique event
await processPayment(event.paymentId, event.amount);
},
{ persistenceStore, config: { eventKeyJmesPath: 'paymentId' } }
);
Q: What's the difference between Lambda's function-level DLQ and an SQS queue's DLQ?
A: Lambda's function-level DLQ handles failed asynchronous invocations (events from S3, SNS, EventBridge) after all Lambda-level retries are exhausted. It does not apply to SQS-triggered invocations. For SQS triggers, the DLQ must be configured on the SQS queue itself (via the queue's redrive policy), which receives messages after maxReceiveCount delivery attempts. Additionally, when using partial batch failure reporting (batchItemFailures), failed messages return to the SQS queue individually and eventually go to the SQS DLQ—not Lambda's DLQ.
Q: How does Lambda pricing work, and how would you optimize costs for a high-volume function? A: Lambda charges per request ($0.20/M) plus per GB-second of duration. For optimization: (1) run Power Tuning to find the memory sweet spot—often lower memory with longer duration is cheaper than more memory with shorter duration, but sometimes the opposite; (2) switch to ARM64/Graviton2 for 20% duration cost savings; (3) reduce average duration by optimizing hot paths, using connection pooling, and caching SSM/Secrets Manager results via the Parameters and Secrets extension; (4) use batch windows for SQS to reduce invocation count; (5) for very high, steady traffic, compare Lambda cost to a long-running Fargate container.
Q: You have a Lambda function processing DynamoDB Streams that keeps failing on a bad record. What do you do?
A: DynamoDB Streams processing is blocking—a failed batch keeps retrying until it expires or succeeds, which can halt all downstream processing for that shard. Solutions: (1) set BisectBatchOnFunctionError: true on the ESM—splits the batch in half recursively to isolate the bad record; (2) configure MaximumRetryAttempts to limit retries; (3) configure a DLQ on the event source mapping (not the Lambda function DLQ) to capture permanently failed records for later inspection; (4) implement poison pill detection in the handler and route bad records to an S3 error bucket or SQS dead letter queue manually.
Q: Explain the trade-offs between Lambda@Edge and CloudFront Functions. A: CloudFront Functions are ultra-cheap ($0.10/M), execute in under 1ms, but are limited to 2 MB memory, 10 KB code, no network access, and only viewer request/response triggers. They are ideal for simple URL rewrites, header normalization, and cookie manipulation. Lambda@Edge is 6× more expensive but supports full Node.js, up to 10 GB memory, 50 MB packages, 30s execution at origin, and can make external HTTP calls. Use Lambda@Edge for JWT verification, A/B testing with external config, dynamic image resizing, and server-side rendering. Use CloudFront Functions for everything else to minimize cost.
Q: How would you design a serverless system to handle 100,000 file uploads per hour with processing?
A: (1) Ingestion: Pre-signed S3 URLs for direct browser-to-S3 upload (no Lambda in the upload path—avoids size limits and costs). (2) Trigger: S3 ObjectCreated event → SQS Standard queue (decouples upload rate from processing rate, absorbs bursts). (3) Processing Lambda: ESM from SQS with batch size 10, batch window 30s. Processes files in parallel up to Lambda concurrency limit. Implements partial batch failure reporting. (4) Concurrency: Set reserved concurrency to prevent starving other functions; ~28 uploads/second → ~3–5 concurrent Lambdas needed at 200ms/file. (5) Failure path: SQS DLQ for failed files → alert SNS → manual review. (6) Observability: Structured logs, CloudWatch dashboard on IteratorAge (if using Kinesis), SQS ApproximateNumberOfMessagesNotVisible, Lambda Errors and Duration.
Lambda Extensions are processes that run alongside the Lambda function handler in the same execution environment. They enable integrations for monitoring, security, and configuration without modifying function code.
INVOKE and/or SHUTDOWN lifecycle events| Extension | Purpose | Impact |
|---|---|---|
| Parameters & Secrets | Cache SSM Parameter Store + Secrets Manager values; HTTP API on port 2773 | Reduces calls to SSM/ASM; up to ~75% cost reduction vs direct SDK calls |
| CloudWatch Lambda Insights | Per-invocation performance metrics (init duration, memory used, network) | ~1.3 MB layer; minimal overhead |
| ADOT (OpenTelemetry) | Collect traces/metrics via OpenTelemetry Collector; send to X-Ray, CloudWatch, third-party | Standardized instrumentation |
| Datadog Agent | Forward metrics/logs/traces to Datadog | Requires Datadog account |
// Instead of calling SSM SDK (creates new HTTP call per invocation)
// Use the Parameters & Secrets extension (caches with TTL)
const PARAMS_PORT = 2773; // extension listens on this port
async function getParameter(name) {
const response = await fetch(
`http://localhost:${PARAMS_PORT}/systemsmanager/parameters/get?name=${encodeURIComponent(name)}&withDecryption=true`,
{ headers: { 'X-Aws-Parameters-Secrets-Token': process.env.AWS_SESSION_TOKEN } }
);
const { Parameter } = await response.json();
return Parameter.Value;
}
async function getSecret(secretId) {
const response = await fetch(
`http://localhost:${PARAMS_PORT}/secretsmanager/get?secretId=${encodeURIComponent(secretId)}`,
{ headers: { 'X-Aws-Parameters-Secrets-Token': process.env.AWS_SESSION_TOKEN } }
);
const { SecretString } = await response.json();
return JSON.parse(SecretString);
}
// Cache TTL: default 300s for SSM, 300s for Secrets Manager (configurable via env vars)
// PARAMETERS_SECRETS_EXTENSION_CACHE_SIZE: max cached items (default 1000)
// SSM_PARAMETER_STORE_TTL: override TTL in seconds
SnapStart addresses the Java JVM's notoriously long cold starts (3–10s) by pre-initializing the runtime and taking a snapshot.
INIT phase including your init code)PublishVersion: true)import com.amazonaws.services.lambda.runtime.snapstart.CRaCInterface;
import org.crac.Context;
import org.crac.Resource;
// Implement CRaC hooks to handle state that shouldn't persist across snapshots
public class Handler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>, Resource {
private DatabaseConnection dbConnection;
public Handler() {
Core.getGlobalContext().register(this);
// Initialize expensive resources once
dbConnection = new DatabaseConnection(System.getenv("DB_URL"));
}
@Override
public void beforeCheckpoint(Context<? extends Resource> context) {
// Close connections before snapshot — they'll be invalid after restore
dbConnection.close();
}
@Override
public void afterRestore(Context<? extends Resource> context) {
// Re-establish connections after restore
dbConnection = new DatabaseConnection(System.getenv("DB_URL"));
}
}
SecureRandom.generateSeed() needed after restoreFunction URLs provide a built-in HTTPS endpoint for invoking Lambda directly without API Gateway.
https://<url-id>.lambda-url.<region>.on.aws
| AuthType | Use Case | IAM Required? |
|---|---|---|
NONE | Public API, webhooks (Stripe, GitHub) | No — anyone can invoke |
AWS_IAM | Internal service-to-service | Yes — IAM Sig v4 |
// CDK: Lambda Function URL with CORS
const fnUrl = myFunction.addFunctionUrl({
authType: lambda.FunctionUrlAuthType.NONE,
cors: {
allowedOrigins: ['https://myapp.com'],
allowedHeaders: ['content-type', 'authorization'],
allowedMethods: [lambda.HttpMethod.POST, lambda.HttpMethod.GET],
maxAge: cdk.Duration.hours(1),
},
});
new cdk.CfnOutput(this, 'FunctionUrl', { value: fnUrl.url });
Function URLs support RESPONSE_STREAM invoke mode — return data progressively (like SSE/chunked transfer):
const { Readable } = require('stream');
exports.handler = awslambda.streamifyResponse(async (event, responseStream, context) => {
responseStream.setContentType('text/plain');
const tokens = ['Hello', ' ', 'from', ' ', 'streaming', ' ', 'Lambda'];
for (const token of tokens) {
responseStream.write(token);
await new Promise(resolve => setTimeout(resolve, 100));
}
responseStream.end();
});
Use case: streaming LLM completions from Lambda without buffering the full response.
| Criterion | Function URL | API Gateway |
|---|---|---|
| Cost | Free (Lambda invoke cost only) | $3.50-$1.00/M requests + data |
| Features | Single endpoint, no routing | Multiple routes, auth, throttling, caching |
| Custom domain | Via CloudFront | Native support |
| Response streaming | Yes (RESPONSE_STREAM mode) | No (buffered only) |
| Best for | Webhooks, simple single-function APIs, streaming | Multi-function APIs, enterprise features |
Lambda Powertools (TypeScript/Python/Java/.NET) is an AWS-maintained suite of utilities for production Lambda functions.
import { Logger } from '@aws-lambda-powertools/logger';
const logger = new Logger({ serviceName: 'orders-service', logLevel: 'INFO' });
export const handler = async (event: APIGatewayProxyEvent) => {
// Inject correlation ID, cold start flag, account ID automatically
logger.addContext(context);
// Structured log — every field is queryable in CloudWatch Logs Insights
logger.info('Processing order', {
orderId: event.pathParameters?.id,
userId: event.requestContext.authorizer?.claims.sub,
});
try {
const result = await processOrder(event);
logger.info('Order processed', { orderId: result.id, durationMs: result.duration });
return { statusCode: 200, body: JSON.stringify(result) };
} catch (err) {
logger.error('Order processing failed', err as Error);
return { statusCode: 500, body: 'Internal error' };
}
};
import { Tracer } from '@aws-lambda-powertools/tracer';
const tracer = new Tracer({ serviceName: 'orders-service' });
const ddb = tracer.captureAWSv3Client(new DynamoDBClient({})); // auto-trace all DDB calls
export const handler = async (event: any) => {
const segment = tracer.getSegment();
const subsegment = segment.addNewSubsegment('## processPayment');
tracer.putAnnotation('orderId', event.orderId); // indexed — searchable in X-Ray
tracer.putMetadata('orderDetails', event); // not indexed — visible in trace
try {
const result = await chargePayment(event);
tracer.putAnnotation('paymentStatus', 'SUCCESS');
return result;
} catch (err) {
subsegment.addError(err as Error);
throw err;
} finally {
subsegment.close();
}
};
import { AppConfigProvider } from '@aws-lambda-powertools/parameters/appconfig';
const provider = new AppConfigProvider({ environment: 'production', application: 'orders' });
export const handler = async (event: any) => {
const flags = await provider.getJSON('feature-flags');
if (flags.newCheckoutFlow) {
return await newCheckout(event);
}
return await legacyCheckout(event);
};