Skip to main content

Command Palette

Search for a command to run...

AWS ElastiCache with Valkey: Complete Setup Guide

Published
22 min read

This comprehensive guide covers everything you need to know about setting up and connecting to AWS ElastiCache using Valkey (Redis-compatible). It includes step-by-step setup instructions, connection code examples, and solutions to common issues based on real-world experience.

Table of Contents

Overview

AWS ElastiCache is a fully managed in-memory caching service that supports Redis-compatible engines like Valkey. It provides high-performance, scalable caching for applications requiring fast data access. ElastiCache offers two main deployment options:

  • Valkey Serverless: Fully managed, auto-scaling option with minimal configuration

  • Valkey Node-Based Cluster: Traditional cluster deployment with more control over configuration

Prerequisites

Before you begin, ensure you have:

  • An active AWS account with appropriate IAM permissions

  • Access to AWS Management Console

  • A configured VPC with appropriate subnets

  • Basic understanding of AWS networking (VPC, Security Groups, Subnets)

  • Node.js application (if connecting from Node.js)

Understanding Deployment Options

Valkey Serverless

Best for:

  • Applications with variable or unpredictable traffic

  • Simplified operations with minimal configuration

  • Auto-scaling requirements

  • Development and testing environments

Key characteristics:

  • Automatically scales based on demand

  • Serverless endpoint (proxy-based)

  • No cluster management required

  • Single Redis client connection (not cluster mode)

  • ⚠️ Not compatible with BullMQ (cannot configure maxmemory-policy)

Valkey Node-Based Cluster

Best for:

  • Applications requiring specific node configurations

  • BullMQ or other queue systems (requires node-based deployment with custom parameters)

  • Predictable workloads with known capacity

  • Fine-grained control over caching infrastructure

Key characteristics:

  • Manual cluster configuration

  • Support for cluster mode enabled/disabled

  • Direct node access

  • More configuration options

Important for BullMQ Users: If you plan to use BullMQ with Node.js, you must choose the Node-Based Cluster deployment option. BullMQ requires:

  • Direct node access (not available in Serverless)

  • Custom maxmemory-policy set to noeviction (cannot be configured in Serverless)

  • See the Configuring ElastiCache for BullMQ section for complete setup instructions.

Creating Valkey Serverless Cache

Follow these steps to create a Valkey Serverless cache:

Step 1: Access ElastiCache Console

  1. Sign in to the AWS Management Console

  2. Navigate to ElastiCache Console

  3. In the left navigation pane, select Valkey caches

  4. Click Create Valkey cache button

Step 2: Configure Cache Settings

  1. Deployment option: Select Serverless (default)

  2. Cache settings:

    • Name: Enter a descriptive name (e.g., my-project-cache)

    • Description: (Optional) Add a description for your cache

  3. Configuration: Leave the default settings selected for initial setup

  4. Network settings: ElastiCache will automatically configure networking

Step 3: Create and Wait

  1. Review your configuration

  2. Click Create to create the cache

  3. Wait for the cache status to change to ACTIVE (typically takes 5-10 minutes)

  4. Once active, you can retrieve the endpoint URL from the cache details page

Step 4: Get Connection Endpoint

After the cache is created:

  1. Select your cache from the list

  2. Go to the Connectivity & security tab

  3. Copy the Configuration endpoint (e.g., my-cache.serverless.use1.cache.amazonaws.com)

Creating Valkey Node-Based Cluster

Follow these steps to create a Valkey Node-Based Cluster:

Step 1: Access ElastiCache Console

  1. Sign in to the AWS Management Console

  2. Navigate to ElastiCache Console

  3. In the left navigation pane, select Valkey caches

  4. Click Create Valkey cache button

Step 2: Select Deployment Option

  1. Deployment option: Select Design your own cache

  2. Creation method: Select Cluster cache

  3. Cluster mode: Choose Disabled (for simpler setup) or Enabled (for sharding)

Step 3: Configure Cluster Settings

Cluster Information:

  • Name: Enter a cluster name (e.g., my-project-cluster)

  • Description: (Optional) Add a description

  • Engine version: Use the latest compatible version

  • Port: Keep default 6379

  • Parameter group: Use default or select custom

  • Node type: Choose based on your memory and CPU requirements (e.g., cache.t3.micro for testing)

  • Number of replicas: Set to 0 for single node, or add replicas for high availability

Step 4: Configure Subnet Group

In the Connectivity section:

  1. Subnet groups:

    • If you don't have a subnet group, select Create a new subnet group

    • Name: Enter subnet group name (e.g., my-subnet-group)

    • Description: Add a description

    • VPC: Select your VPC from the dropdown

    • Subnets: Select at least 2 subnets in different availability zones

  2. Click Next

Step 5: Configure Security Settings

  1. In the Selected security groups section, click Manage

  2. Select appropriate security groups:

    • Choose existing security group OR

    • Create a new security group with inbound rule allowing port 6379

  3. Encryption:

    • Enable Encryption at rest (recommended for production)

    • Enable Encryption in transit (TLS) (recommended)

Step 6: Configure Backup and Maintenance (Optional)

  1. Automatic backups: Enable for production environments

  2. Maintenance window: Choose preferred maintenance window

  3. SNS notifications: (Optional) Configure notifications

Step 7: Review and Create

  1. Click Next to review all settings

  2. Verify your configuration

  3. Click Create to create the cluster

  4. Wait for the cluster status to become Available (typically 10-15 minutes)

Step 8: Get Connection Endpoint

After the cluster is created:

  1. Select your cluster from the list

  2. Go to the Details tab

  3. Copy the Primary endpoint (or Configuration endpoint if cluster mode is enabled)

Connecting to ElastiCache

Understanding Connection Types

AWS ElastiCache has three different Redis connection patterns:

Deployment Type Correct Client Wrong Client
Self-managed Redis new Redis() -
ElastiCache Node-Based (Cluster Mode Disabled) new Redis() new Redis.Cluster()
ElastiCache Serverless new Redis() new Redis.Cluster()
ElastiCache Node-Based (Cluster Mode Enabled) new Redis.Cluster() new Redis()

Critical: Serverless endpoints use a proxy architecture and do NOT expose individual cluster nodes. Always use new Redis() for Serverless, never new Redis.Cluster().

Connecting to Valkey Serverless

import Redis from "ioredis";

const redis = new Redis({
  host: process.env.REDIS_HOST, // e.g., my-cache.serverless.use1.cache.amazonaws.com
  port: 6379,
  tls: {}, // TLS is required for AWS ElastiCache
  connectTimeout: 10000,
  maxRetriesPerRequest: null, // Important for BullMQ
});

// Event listeners for monitoring
redis.on("connect", () => {
  console.log("✅ Redis connected successfully");
});

redis.on("error", (err) => {
  console.error("❌ Redis connection error:", err);
});

redis.on("close", () => {
  console.log("Redis connection closed");
});

// Example usage
async function testConnection() {
  try {
    // Set a value
    await redis.set("test-key", "Hello ElastiCache!");
    console.log("✅ Set operation successful");

    // Get the value
    const value = await redis.get("test-key");
    console.log("✅ Retrieved value:", value);

    // Clean up
    await redis.del("test-key");
  } catch (error) {
    console.error("❌ Operation failed:", error);
  }
}

testConnection();

Connecting to Valkey Node-Based Cluster (Cluster Mode Disabled)

import Redis from "ioredis";

const redis = new Redis({
  host: process.env.REDIS_HOST, // Primary endpoint
  port: 6379,
  tls: {},
  connectTimeout: 10000,
  retryStrategy: (times) => {
    const delay = Math.min(times * 50, 2000);
    return delay;
  },
});

redis.on("connect", () => console.log("Redis connected"));
redis.on("error", (err) => console.error("Redis error:", err));

Connecting to Valkey Node-Based Cluster (Cluster Mode Enabled)

import Redis from "ioredis";

const cluster = new Redis.Cluster(
  [
    {
      host: process.env.REDIS_HOST, // Configuration endpoint
      port: 6379,
    },
  ],
  {
    dnsLookup: (address, callback) => callback(null, address),
    redisOptions: {
      tls: {},
      connectTimeout: 10000,
    },
    clusterRetryStrategy: (times) => {
      const delay = Math.min(times * 50, 2000);
      return delay;
    },
  },
);

cluster.on("connect", () => console.log("Cluster connected"));
cluster.on("error", (err) => console.error("Cluster error:", err));

Using with BullMQ

import { Queue, Worker } from "bullmq";
import Redis from "ioredis";

// Connection configuration
const connection = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
  maxRetriesPerRequest: null, // Critical for BullMQ
});

// Create a queue
const queue = new Queue("my-queue", { connection });

// Add a job
await queue.add("job-name", { data: "example" });

// Create a worker
const worker = new Worker(
  "my-queue",
  async (job) => {
    console.log("Processing job:", job.id);
    // Process job here
  },
  { connection },
);

Configuring ElastiCache for BullMQ

If you're using BullMQ with AWS ElastiCache, there's a critical configuration requirement you must complete before your queues will work properly.

Why This Configuration Is Required

BullMQ requires Redis to use the noeviction maxmemory policy. This policy ensures that Redis never evicts keys when memory is full, which is essential for queue reliability. If keys are evicted, you could lose jobs from your queue.

Important Notes:

  • ⚠️ Serverless ElastiCache is NOT compatible with BullMQ due to an incompatible default maxmemory-policy that cannot be changed

  • ✅ You must use Node-Based Cluster deployment for BullMQ

  • Default parameter groups in AWS cannot be modified, so you must create a custom parameter group

Common Error Without This Configuration

Without the correct maxmemory-policy, you may encounter errors such as:

OOM command not allowed when used memory > 'maxmemory'

Or jobs may silently disappear from your queue when memory pressure occurs.

Step-by-Step Configuration Guide

Step 1: Create a Custom Parameter Group

  1. Navigate to ElastiCache ConsoleParameter Groups (in the left sidebar)

  2. Click Create parameter group

  3. Configure the parameter group:

    • Family: Select the Redis version family (e.g., redis7.x for Redis 7)

    • Name: Enter a descriptive name (e.g., bullmq-parameters)

    • Description: Add a description (e.g., Custom parameters for BullMQ queues)

  4. Click Create

Step 2: Modify the maxmemory-policy Parameter

  1. In the Parameter Groups list, find your newly created parameter group

  2. Click on the parameter group name to open it

  3. Click Edit or Edit parameters

  4. In the search box, type: maxmemory-policy

  5. Change the value from volatile-lru (default) to noeviction

  6. Click Save changes

Step 3: Apply the Custom Parameter Group to Your Cluster

For existing clusters:

  1. Go to ElastiCache ConsoleRedis caches (or Valkey caches)

  2. Select your cluster by clicking the checkbox

  3. Click Modify

  4. Scroll down to Cluster settings section

  5. In the Parameter group dropdown, select your custom parameter group (e.g., bullmq-parameters)

  6. Scroll to the bottom and click Preview changes

  7. Review the changes

  8. Click Modify to apply

For new clusters:

During cluster creation (Step 3 of "Creating Valkey Node-Based Cluster"):

  • In the Cluster settings section

  • Find the Parameter group field

  • Select your custom parameter group from the dropdown

Step 4: Restart Required (For Existing Clusters)

⚠️ Important: Changing the parameter group requires a cluster restart for the changes to take effect.

  1. After modifying, AWS will schedule the change

  2. Choose to apply the change:

    • Immediately: Cluster will restart now (brief downtime)

    • During maintenance window: Applied during next maintenance window

  3. Monitor the cluster status until it returns to Available

Step 5: Verify the Configuration

After the cluster is available, verify the configuration:

Option 1: Using Redis CLI from EC2:

# Connect to your ElastiCache instance
redis-cli -h your-cache.region.cache.amazonaws.com -p 6379 --tls

# Check the maxmemory-policy
CONFIG GET maxmemory-policy

Expected output:

1) "maxmemory-policy"
2) "noeviction"

Option 2: Using ioredis in your application:

import Redis from "ioredis";

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
});

async function verifyConfig() {
  const policy = await redis.config("GET", "maxmemory-policy");
  console.log("maxmemory-policy:", policy[1]); // Should output: noeviction

  if (policy[1] !== "noeviction") {
    console.error("⚠️ WARNING: maxmemory-policy is not set to noeviction!");
    console.error("BullMQ may not work correctly.");
  } else {
    console.log("✅ Configuration is correct for BullMQ");
  }
}

verifyConfig();

Complete BullMQ Setup Example

Once your parameter group is configured correctly:

import { Queue, Worker, QueueEvents } from "bullmq";
import Redis from "ioredis";

// Create connection with BullMQ-optimized settings
const connection = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
  maxRetriesPerRequest: null, // Required for BullMQ
  enableReadyCheck: false,
  maxLoadingRetryTime: 5000,
});

// Verify configuration on startup
connection.config("GET", "maxmemory-policy").then(([, policy]) => {
  if (policy !== "noeviction") {
    console.error(
      '❌ CRITICAL: maxmemory-policy must be "noeviction" for BullMQ',
    );
    process.exit(1);
  }
  console.log("✅ Redis configuration verified for BullMQ");
});

// Create a queue
const myQueue = new Queue("my-queue", {
  connection,
  defaultJobOptions: {
    attempts: 3,
    backoff: {
      type: "exponential",
      delay: 1000,
    },
    removeOnComplete: {
      count: 100, // Keep last 100 completed jobs
      age: 3600, // Keep jobs for 1 hour
    },
    removeOnFail: {
      count: 500, // Keep last 500 failed jobs
    },
  },
});

// Create a worker
const worker = new Worker(
  "my-queue",
  async (job) => {
    console.log(`Processing job ${job.id} with data:`, job.data);
    // Your job processing logic here
    return { success: true };
  },
  {
    connection: connection.duplicate(), // Important: use duplicate connection
    concurrency: 10,
    limiter: {
      max: 100,
      duration: 1000,
    },
  },
);

// Event listeners
worker.on("completed", (job) => {
  console.log(`✅ Job ${job.id} completed`);
});

worker.on("failed", (job, err) => {
  console.error(`❌ Job ${job.id} failed:`, err.message);
});

// Queue events for monitoring
const queueEvents = new QueueEvents("my-queue", { connection });

queueEvents.on("waiting", ({ jobId }) => {
  console.log(`Job ${jobId} is waiting`);
});

// Add jobs to the queue
async function addJobs() {
  await myQueue.add("process-data", { userId: 123, action: "process" });
  await myQueue.add("send-email", { to: "user@example.com", subject: "Hello" });
  console.log("Jobs added to queue");
}

addJobs();

// Graceful shutdown
process.on("SIGTERM", async () => {
  console.log("Shutting down...");
  await worker.close();
  await myQueue.close();
  await queueEvents.close();
  await connection.quit();
  process.exit(0);
});

Best Practices for BullMQ with ElastiCache

  1. Always verify maxmemory-policy on startup - Add a check in your application initialization

  2. Use appropriate maxmemory setting - Set maxmemory on your parameter group based on your node type (e.g., 80% of available memory)

  3. Monitor memory usage - Set up CloudWatch alarms for memory usage

  4. Use job retention policies - Configure removeOnComplete and removeOnFail to prevent memory bloat

  5. Duplicate connections for workers - Use connection.duplicate() for workers to avoid connection issues

  6. Enable Redis persistence - Consider enabling AOF (Append Only File) for queue durability

  7. Test failover scenarios - If using replicas, test that your application handles failover correctly

Quick Reference: Parameter Group Settings for BullMQ

Parameter Recommended Value Reason
maxmemory-policy noeviction Required - Prevents job loss
maxmemory 80% of node memory Prevents OOM, leaves room for overhead
timeout 300 Close idle connections after 5 minutes
tcp-keepalive 300 Keep connections alive
appendonly yes (optional) Persistence for queue durability
appendfsync everysec (optional) Balance between performance and safety

Troubleshooting Common Issues

Error 1: ClusterAllFailedError: Failed to refresh slots cache

Full error message:

ClusterAllFailedError: Failed to refresh slots cache

Cause:
You're using new Redis.Cluster() with a Serverless or Node-Based (Cluster Mode Disabled) endpoint. These endpoints do not expose cluster topology information.

Why it happens:

  • Serverless endpoints are proxy-based and hide the internal cluster architecture

  • The Redis Cluster client tries to discover cluster nodes and shard slots

  • This discovery fails because the endpoint doesn't provide cluster topology

Solution:

Use the standard Redis client instead:

// ❌ WRONG - Don't use this with Serverless
const redis = new Redis.Cluster([
  { host: "my-cache.serverless.use1.cache.amazonaws.com", port: 6379 },
]);

// ✅ CORRECT - Use this instead
const redis = new Redis({
  host: "my-cache.serverless.use1.cache.amazonaws.com",
  port: 6379,
  tls: {},
});

Error 2: ETIMEDOUT - Connection Timeout

Full error message:

Error: connect ETIMEDOUT
  at TLSSocket.<anonymous>
  errorno: 'ETIMEDOUT',
  code: 'ETIMEDOUT',
  syscall: 'connect'

Cause:
Your application cannot reach ElastiCache over the network. This is always a networking/security group issue, not a code issue.

90% of the time, the cause is:

  • ElastiCache security group not allowing inbound traffic from your application

  • Application and ElastiCache in different VPCs

  • Missing subnet route configuration

Solution Steps:

Step 1: Verify VPC Configuration

Check that your application and ElastiCache are in the same VPC:

  1. For EC2/ECS/Lambda:

    • Go to EC2 Console → Select your instance

    • Click Networking tab → Note the VPC ID

  2. For ElastiCache:

    • Go to ElastiCache Console → Select your cache

    • Click Details tab → Note the VPC ID

  3. Verify: Both VPC IDs must be identical

If VPCs are different: Connection will always fail. You need to either recreate the cache in the correct VPC or use VPC peering.

Step 2: Configure Security Group (Most Common Fix)

Configure ElastiCache Security Group:

  1. Go to ElastiCache Console

  2. Select your cache → Connectivity & security tab

  3. Click on the Security group link

  4. Click Edit inbound rules

  5. Add a new rule:

    Type Protocol Port Range Source
    Custom TCP TCP 6379 Select Security Group → Choose your EC2/ECS/Lambda security group

Example:

Type: Custom TCP
Protocol: TCP
Port: 6379
Source: sg-0abc123def456 (your-app-security-group)
Description: Allow Redis traffic from application

Important: Use Security Group ID as the source, not IP addresses. This allows AWS to handle internal routing automatically.

Step 3: Verify Application Security Group (Outbound)

  1. Go to EC2 ConsoleSecurity Groups

  2. Select your application's security group

  3. Click Outbound rules tab

  4. Ensure there's a rule allowing outbound traffic:

    Type Protocol Port Range Destination
    All traffic All All 0.0.0.0/0

This is usually configured by default, but verify to be sure.

Step 4: Check Subnet Configuration

ElastiCache attaches to private subnets. Verify:

  1. For ElastiCache:

    • ElastiCache Console → Subnet groups

    • Verify subnets have proper route tables

  2. For your application:

    • Must be in subnets that can route to ElastiCache subnets

    • Usually automatic if in the same VPC

Step 5: Test Network Connectivity

SSH into your EC2 instance (or exec into your container) and test connectivity:

# Test with netcat (preferred)
nc -zv your-cache.serverless.use1.cache.amazonaws.com 6379

# Test with telnet
telnet your-cache.serverless.use1.cache.amazonaws.com 6379

Expected output:

Connection to your-cache.serverless.use1.cache.amazonaws.com 6379 port [tcp/*] succeeded!

If you see timeout:

Connection timed out

→ Security group or VPC configuration is still incorrect. Review steps 1-4.

If connection succeeds but your app still fails: → Check your TLS configuration in code (ensure tls: {} is set).

Error 3: Connection Refused

Error message:

Error: connect ECONNREFUSED

Causes:

  1. Wrong hostname or port

  2. ElastiCache is not in "Available" or "Active" status

  3. Using localhost instead of actual endpoint

Solution:

  1. Verify endpoint:

    • Go to ElastiCache Console → Your cache

    • Copy the exact endpoint from Connectivity & security tab

    • Ensure you're using the correct port (default: 6379)

  2. Check cache status:

    • Cache must be in Available (Node-Based) or Active (Serverless) status

    • If status is "Creating" or "Modifying", wait for it to complete

  3. Don't use localhost:

    // ❌ WRONG
    host: "localhost";
    
    // ✅ CORRECT
    host: "my-cache.serverless.use1.cache.amazonaws.com";
    

Error 4: Cannot Access from Local Development Machine

Cause:
ElastiCache is private by default and only accessible from within the VPC.

Where ElastiCache works:

  • ✅ EC2 instances in the same VPC

  • ✅ ECS tasks in the same VPC

  • ✅ Lambda functions in the same VPC

  • ✅ Other AWS services in the same VPC

Where ElastiCache does NOT work:

  • ❌ Your local development machine

  • ❌ External servers outside AWS

  • ❌ Different VPCs (without VPC peering/transit gateway)

Solutions for local development:

Option 1: Use SSH Tunnel (Recommended)

# Create SSH tunnel through bastion/EC2 instance
ssh -i your-key.pem -L 6379:your-cache.serverless.use1.cache.amazonaws.com:6379 ec2-user@your-ec2-ip

# Now connect to localhost in your application
const redis = new Redis({
  host: 'localhost',
  port: 6379,
  tls: {}, // Still required
});

Option 2: Use a Separate Development Cache

Create a separate ElastiCache instance with a different configuration for development, or use a local Redis instance.

Option 3: Deploy to EC2 for Testing

Deploy your application to an EC2 instance in the same VPC for testing.

Error 5: TLS Handshake Errors

Error message:

Error: unable to verify the first certificate
Error: TLS handshake failed

Cause:
Missing or incorrect TLS configuration.

Solution:

Always include tls: {} in your connection configuration:

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {}, // This is required for AWS ElastiCache
});

If you need to disable TLS verification (not recommended for production):

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {
    rejectUnauthorized: false, // Only for testing
  },
});

Error 6: BullMQ Jobs Disappearing or OOM Errors

Error messages:

OOM command not allowed when used memory > 'maxmemory'

Or jobs silently disappear from queues without processing.

Cause:
ElastiCache is using the default maxmemory-policy of volatile-lru or allkeys-lru, which evicts keys when memory is full. BullMQ requires the noeviction policy to ensure jobs are never lost.

Why it happens:

  • Default parameter groups use eviction policies designed for caching, not queuing

  • When Redis memory fills up, it evicts keys (including your job data)

  • BullMQ jobs are stored as Redis keys, so they can be evicted

Solution:

You must create a custom parameter group with maxmemory-policy set to noeviction. See the complete guide in the Configuring ElastiCache for BullMQ section above.

Quick fix steps:

  1. Create custom parameter group with Redis family matching your cluster

  2. Set maxmemory-policy to noeviction

  3. Apply the parameter group to your cluster

  4. Restart the cluster (required for changes to take effect)

Prevention:

Add this verification to your application startup:

import Redis from "ioredis";

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
});

// Verify on startup
const [, policy] = await redis.config("GET", "maxmemory-policy");
if (policy !== "noeviction") {
  console.error(
    '❌ CRITICAL: maxmemory-policy must be "noeviction" for BullMQ',
  );
  console.error(`Current policy: ${policy}`);
  process.exit(1);
}
console.log("✅ Redis configured correctly for BullMQ");

Important: This issue only affects Node-Based clusters. Serverless ElastiCache cannot be configured with noeviction and is not compatible with BullMQ.

Best Practices

1. Use Environment Variables

Never hardcode connection details:

// ✅ GOOD
const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: parseInt(process.env.REDIS_PORT || "6379"),
  tls: {},
});

// ❌ BAD
const redis = new Redis({
  host: "my-cache.serverless.use1.cache.amazonaws.com",
  port: 6379,
  tls: {},
});

2. Implement Connection Error Handling

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
  retryStrategy: (times) => {
    if (times > 3) {
      return null; // Stop retrying after 3 attempts
    }
    return Math.min(times * 200, 2000);
  },
  reconnectOnError: (err) => {
    const targetError = "READONLY";
    if (err.message.includes(targetError)) {
      return true; // Reconnect on specific errors
    }
    return false;
  },
});

redis.on("error", (err) => {
  console.error("Redis error:", err);
  // Send to error tracking service (Sentry, CloudWatch, etc.)
});

redis.on("connect", () => {
  console.log("Redis connected");
});

redis.on("close", () => {
  console.log("Redis connection closed");
});

3. Use Security Group References, Not IP Addresses

When configuring security groups:

✅ GOOD: Source = sg-xxxxx (security group ID)
❌ BAD: Source = 10.0.1.5/32 (IP address)

Security group referencing allows AWS to handle internal IP changes automatically.

4. Enable Encryption for Production

Always enable:

  • Encryption at rest (data stored on disk)

  • Encryption in transit (TLS)

This is configured during cache creation and cannot be changed after creation.

5. Use Multiple Availability Zones

For production environments:

  • Enable multi-AZ deployment

  • Use at least 1 replica node

  • Enables automatic failover

6. Monitor Your Cache

Set up CloudWatch alarms for:

  • CPUUtilization (alert if > 75%)

  • DatabaseMemoryUsagePercentage (alert if > 80%)

  • EngineCPUUtilization (alert if > 75%)

  • NetworkBytesIn/Out

  • CurrConnections

7. Implement Connection Pooling

Reuse Redis connections instead of creating new ones for each request:

// Create once at application startup
const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
  maxRetriesPerRequest: 3,
  enableReadyCheck: true,
  lazyConnect: false,
});

// Reuse throughout application
export default redis;

8. Use Appropriate TTLs

Set time-to-live (TTL) for cached data:

// Set with TTL (expires in 1 hour)
await redis.setex("key", 3600, "value");

// Set with TTL using SET command
await redis.set("key", "value", "EX", 3600);

9. Test Network Connectivity During Setup

Before deploying your application, verify connectivity from your compute environment (EC2/ECS/Lambda) to ElastiCache.

10. Document Your Configuration

Keep a record of:

  • VPC ID

  • Subnet group

  • Security groups

  • Node type

  • Cluster/Serverless configuration

  • Backup and maintenance windows

Important Notes

About VPC and Networking

  • ElastiCache is VPC-private by default and cannot be accessed from the internet

  • You cannot change the VPC after cache creation

  • All clients must be in the same VPC (or use VPC peering/transit gateway)

  • Security groups act as firewalls—configure them correctly

About Serverless vs Node-Based

  • Serverless is easier to manage but gives less control

  • Serverless cannot be used with BullMQ (incompatible maxmemory-policy configuration)

  • Node-Based is required for BullMQ and applications needing custom Redis parameters

  • You cannot convert between Serverless and Node-Based after creation

About Cluster Mode

  • Cluster Mode Disabled: Simpler, single endpoint, up to 5 read replicas

  • Cluster Mode Enabled: Better performance for large datasets, multiple shards, requires Redis Cluster client

About TLS/Encryption

  • TLS (in-transit encryption) is highly recommended for production

  • Once set, you cannot disable encryption without recreating the cache

  • Always use tls: {} in your Redis client configuration

About Costs

  • Serverless: Pay for data storage and ECPUs (processing units)

  • Node-Based: Pay for node hours based on instance type

  • Data transfer within the same AZ is free

  • Cross-AZ transfer incurs charges

About Backups

  • Backups are important for production workloads

  • Enable automatic snapshots

  • Backups impact performance slightly during snapshot creation

Additional Resources


Last Updated: March 2026
Author: Md Rakibul Islam

This guide is maintained based on actual deployment experiences and common issues encountered in production environments.

71 views