AWS ElastiCache with Valkey: Complete Setup Guide

This comprehensive guide covers everything you need to know about setting up and connecting to AWS ElastiCache using Valkey (Redis-compatible). It includes step-by-step setup instructions, connection code examples, and solutions to common issues based on real-world experience.

Overview
Prerequisites
Understanding Deployment Options
Creating Valkey Serverless Cache
Creating Valkey Node-Based Cluster
Connecting to ElastiCache
Configuring ElastiCache for BullMQ
Troubleshooting Common Issues
Best Practices

Overview

AWS ElastiCache is a fully managed in-memory caching service that supports Redis-compatible engines like Valkey. It provides high-performance, scalable caching for applications requiring fast data access. ElastiCache offers two main deployment options:

Valkey Serverless: Fully managed, auto-scaling option with minimal configuration
Valkey Node-Based Cluster: Traditional cluster deployment with more control over configuration

Prerequisites

Before you begin, ensure you have:

An active AWS account with appropriate IAM permissions
Access to AWS Management Console
A configured VPC with appropriate subnets
Basic understanding of AWS networking (VPC, Security Groups, Subnets)
Node.js application (if connecting from Node.js)

Understanding Deployment Options

Valkey Serverless

Best for:

Applications with variable or unpredictable traffic
Simplified operations with minimal configuration
Auto-scaling requirements
Development and testing environments

Key characteristics:

Automatically scales based on demand
Serverless endpoint (proxy-based)
No cluster management required
Single Redis client connection (not cluster mode)
⚠️ Not compatible with BullMQ (cannot configure maxmemory-policy)

Valkey Node-Based Cluster

Best for:

Applications requiring specific node configurations
BullMQ or other queue systems (requires node-based deployment with custom parameters)
Predictable workloads with known capacity
Fine-grained control over caching infrastructure

Key characteristics:

Manual cluster configuration
Support for cluster mode enabled/disabled
Direct node access
More configuration options

Important for BullMQ Users: If you plan to use BullMQ with Node.js, you must choose the Node-Based Cluster deployment option. BullMQ requires:

Direct node access (not available in Serverless)

Custom maxmemory-policy set to noeviction (cannot be configured in Serverless)

See the Configuring ElastiCache for BullMQ section for complete setup instructions.

Creating Valkey Serverless Cache

Follow these steps to create a Valkey Serverless cache:

Step 1: Access ElastiCache Console

Sign in to the AWS Management Console
Navigate to ElastiCache Console
In the left navigation pane, select Valkey caches
Click Create Valkey cache button

Step 2: Configure Cache Settings

Deployment option: Select Serverless (default)
Cache settings:
- Name: Enter a descriptive name (e.g., my-project-cache)
- Description: (Optional) Add a description for your cache
Configuration: Leave the default settings selected for initial setup
Network settings: ElastiCache will automatically configure networking

Step 3: Create and Wait

Review your configuration
Click Create to create the cache
Wait for the cache status to change to ACTIVE (typically takes 5-10 minutes)
Once active, you can retrieve the endpoint URL from the cache details page

Step 4: Get Connection Endpoint

After the cache is created:

Select your cache from the list
Go to the Connectivity & security tab
Copy the Configuration endpoint (e.g., my-cache.serverless.use1.cache.amazonaws.com)

Creating Valkey Node-Based Cluster

Follow these steps to create a Valkey Node-Based Cluster:

Step 1: Access ElastiCache Console

Sign in to the AWS Management Console
Navigate to ElastiCache Console
In the left navigation pane, select Valkey caches
Click Create Valkey cache button

Step 2: Select Deployment Option

Deployment option: Select Design your own cache
Creation method: Select Cluster cache
Cluster mode: Choose Disabled (for simpler setup) or Enabled (for sharding)

Step 3: Configure Cluster Settings

Cluster Information:

Name: Enter a cluster name (e.g., my-project-cluster)
Description: (Optional) Add a description
Engine version: Use the latest compatible version
Port: Keep default 6379
Parameter group: Use default or select custom
Node type: Choose based on your memory and CPU requirements (e.g., cache.t3.micro for testing)
Number of replicas: Set to 0 for single node, or add replicas for high availability

Step 4: Configure Subnet Group

In the Connectivity section:

Subnet groups:
- If you don't have a subnet group, select Create a new subnet group
- Name: Enter subnet group name (e.g., my-subnet-group)
- Description: Add a description
- VPC: Select your VPC from the dropdown
- Subnets: Select at least 2 subnets in different availability zones
Click Next

Step 5: Configure Security Settings

In the Selected security groups section, click Manage
Select appropriate security groups:
- Choose existing security group OR
- Create a new security group with inbound rule allowing port 6379
Encryption:
- Enable Encryption at rest (recommended for production)
- Enable Encryption in transit (TLS) (recommended)

Step 6: Configure Backup and Maintenance (Optional)

Automatic backups: Enable for production environments
Maintenance window: Choose preferred maintenance window
SNS notifications: (Optional) Configure notifications

Step 7: Review and Create

Click Next to review all settings
Verify your configuration
Click Create to create the cluster
Wait for the cluster status to become Available (typically 10-15 minutes)

Step 8: Get Connection Endpoint

After the cluster is created:

Select your cluster from the list
Go to the Details tab
Copy the Primary endpoint (or Configuration endpoint if cluster mode is enabled)

Connecting to ElastiCache

Understanding Connection Types

AWS ElastiCache has three different Redis connection patterns:

Deployment Type	Correct Client	Wrong Client
Self-managed Redis	`new Redis()`	-
ElastiCache Node-Based (Cluster Mode Disabled)	`new Redis()`	`new Redis.Cluster()`
ElastiCache Serverless	`new Redis()`	`new Redis.Cluster()`
ElastiCache Node-Based (Cluster Mode Enabled)	`new Redis.Cluster()`	`new Redis()`

Critical: Serverless endpoints use a proxy architecture and do NOT expose individual cluster nodes. Always use new Redis() for Serverless, never new Redis.Cluster().

Connecting to Valkey Serverless

import Redis from "ioredis";

const redis = new Redis({
  host: process.env.REDIS_HOST, // e.g., my-cache.serverless.use1.cache.amazonaws.com
  port: 6379,
  tls: {}, // TLS is required for AWS ElastiCache
  connectTimeout: 10000,
  maxRetriesPerRequest: null, // Important for BullMQ
});

// Event listeners for monitoring
redis.on("connect", () => {
  console.log("✅ Redis connected successfully");
});

redis.on("error", (err) => {
  console.error("❌ Redis connection error:", err);
});

redis.on("close", () => {
  console.log("Redis connection closed");
});

// Example usage
async function testConnection() {
  try {
    // Set a value
    await redis.set("test-key", "Hello ElastiCache!");
    console.log("✅ Set operation successful");

    // Get the value
    const value = await redis.get("test-key");
    console.log("✅ Retrieved value:", value);

    // Clean up
    await redis.del("test-key");
  } catch (error) {
    console.error("❌ Operation failed:", error);
  }
}

testConnection();

Connecting to Valkey Node-Based Cluster (Cluster Mode Disabled)

import Redis from "ioredis";

const redis = new Redis({
  host: process.env.REDIS_HOST, // Primary endpoint
  port: 6379,
  tls: {},
  connectTimeout: 10000,
  retryStrategy: (times) => {
    const delay = Math.min(times * 50, 2000);
    return delay;
  },
});

redis.on("connect", () => console.log("Redis connected"));
redis.on("error", (err) => console.error("Redis error:", err));

Connecting to Valkey Node-Based Cluster (Cluster Mode Enabled)

import Redis from "ioredis";

const cluster = new Redis.Cluster(
  [
    {
      host: process.env.REDIS_HOST, // Configuration endpoint
      port: 6379,
    },
  ],
  {
    dnsLookup: (address, callback) => callback(null, address),
    redisOptions: {
      tls: {},
      connectTimeout: 10000,
    },
    clusterRetryStrategy: (times) => {
      const delay = Math.min(times * 50, 2000);
      return delay;
    },
  },
);

cluster.on("connect", () => console.log("Cluster connected"));
cluster.on("error", (err) => console.error("Cluster error:", err));

Using with BullMQ

import { Queue, Worker } from "bullmq";
import Redis from "ioredis";

// Connection configuration
const connection = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
  maxRetriesPerRequest: null, // Critical for BullMQ
});

// Create a queue
const queue = new Queue("my-queue", { connection });

// Add a job
await queue.add("job-name", { data: "example" });

// Create a worker
const worker = new Worker(
  "my-queue",
  async (job) => {
    console.log("Processing job:", job.id);
    // Process job here
  },
  { connection },
);

Configuring ElastiCache for BullMQ

If you're using BullMQ with AWS ElastiCache, there's a critical configuration requirement you must complete before your queues will work properly.

Why This Configuration Is Required

BullMQ requires Redis to use the noeviction maxmemory policy. This policy ensures that Redis never evicts keys when memory is full, which is essential for queue reliability. If keys are evicted, you could lose jobs from your queue.

Important Notes:

⚠️ Serverless ElastiCache is NOT compatible with BullMQ due to an incompatible default maxmemory-policy that cannot be changed
✅ You must use Node-Based Cluster deployment for BullMQ
Default parameter groups in AWS cannot be modified, so you must create a custom parameter group

Common Error Without This Configuration

Without the correct maxmemory-policy, you may encounter errors such as:

OOM command not allowed when used memory > 'maxmemory'

Or jobs may silently disappear from your queue when memory pressure occurs.

Step-by-Step Configuration Guide

Step 1: Create a Custom Parameter Group

Navigate to ElastiCache Console → Parameter Groups (in the left sidebar)
Click Create parameter group
Configure the parameter group:
- Family: Select the Redis version family (e.g., redis7.x for Redis 7)
- Name: Enter a descriptive name (e.g., bullmq-parameters)
- Description: Add a description (e.g., Custom parameters for BullMQ queues)
Click Create

Step 2: Modify the maxmemory-policy Parameter

In the Parameter Groups list, find your newly created parameter group
Click on the parameter group name to open it
Click Edit or Edit parameters
In the search box, type: maxmemory-policy
Change the value from volatile-lru (default) to noeviction
Click Save changes

Step 3: Apply the Custom Parameter Group to Your Cluster

For existing clusters:

Go to ElastiCache Console → Redis caches (or Valkey caches)
Select your cluster by clicking the checkbox
Click Modify
Scroll down to Cluster settings section
In the Parameter group dropdown, select your custom parameter group (e.g., bullmq-parameters)
Scroll to the bottom and click Preview changes
Review the changes
Click Modify to apply

For new clusters:

During cluster creation (Step 3 of "Creating Valkey Node-Based Cluster"):

In the Cluster settings section
Find the Parameter group field
Select your custom parameter group from the dropdown

Step 4: Restart Required (For Existing Clusters)

⚠️ Important: Changing the parameter group requires a cluster restart for the changes to take effect.

After modifying, AWS will schedule the change
Choose to apply the change:
- Immediately: Cluster will restart now (brief downtime)
- During maintenance window: Applied during next maintenance window
Monitor the cluster status until it returns to Available

Step 5: Verify the Configuration

After the cluster is available, verify the configuration:

Option 1: Using Redis CLI from EC2:

# Connect to your ElastiCache instance
redis-cli -h your-cache.region.cache.amazonaws.com -p 6379 --tls

# Check the maxmemory-policy
CONFIG GET maxmemory-policy

Expected output:

1) "maxmemory-policy"
2) "noeviction"

Option 2: Using ioredis in your application:

import Redis from "ioredis";

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
});

async function verifyConfig() {
  const policy = await redis.config("GET", "maxmemory-policy");
  console.log("maxmemory-policy:", policy[1]); // Should output: noeviction

  if (policy[1] !== "noeviction") {
    console.error("⚠️ WARNING: maxmemory-policy is not set to noeviction!");
    console.error("BullMQ may not work correctly.");
  } else {
    console.log("✅ Configuration is correct for BullMQ");
  }
}

verifyConfig();

Complete BullMQ Setup Example

Once your parameter group is configured correctly:

import { Queue, Worker, QueueEvents } from "bullmq";
import Redis from "ioredis";

// Create connection with BullMQ-optimized settings
const connection = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
  maxRetriesPerRequest: null, // Required for BullMQ
  enableReadyCheck: false,
  maxLoadingRetryTime: 5000,
});

// Verify configuration on startup
connection.config("GET", "maxmemory-policy").then(([, policy]) => {
  if (policy !== "noeviction") {
    console.error(
      '❌ CRITICAL: maxmemory-policy must be "noeviction" for BullMQ',
    );
    process.exit(1);
  }
  console.log("✅ Redis configuration verified for BullMQ");
});

// Create a queue
const myQueue = new Queue("my-queue", {
  connection,
  defaultJobOptions: {
    attempts: 3,
    backoff: {
      type: "exponential",
      delay: 1000,
    },
    removeOnComplete: {
      count: 100, // Keep last 100 completed jobs
      age: 3600, // Keep jobs for 1 hour
    },
    removeOnFail: {
      count: 500, // Keep last 500 failed jobs
    },
  },
});

// Create a worker
const worker = new Worker(
  "my-queue",
  async (job) => {
    console.log(`Processing job ${job.id} with data:`, job.data);
    // Your job processing logic here
    return { success: true };
  },
  {
    connection: connection.duplicate(), // Important: use duplicate connection
    concurrency: 10,
    limiter: {
      max: 100,
      duration: 1000,
    },
  },
);

// Event listeners
worker.on("completed", (job) => {
  console.log(`✅ Job ${job.id} completed`);
});

worker.on("failed", (job, err) => {
  console.error(`❌ Job ${job.id} failed:`, err.message);
});

// Queue events for monitoring
const queueEvents = new QueueEvents("my-queue", { connection });

queueEvents.on("waiting", ({ jobId }) => {
  console.log(`Job ${jobId} is waiting`);
});

// Add jobs to the queue
async function addJobs() {
  await myQueue.add("process-data", { userId: 123, action: "process" });
  await myQueue.add("send-email", { to: "user@example.com", subject: "Hello" });
  console.log("Jobs added to queue");
}

addJobs();

// Graceful shutdown
process.on("SIGTERM", async () => {
  console.log("Shutting down...");
  await worker.close();
  await myQueue.close();
  await queueEvents.close();
  await connection.quit();
  process.exit(0);
});

Best Practices for BullMQ with ElastiCache

Always verify maxmemory-policy on startup - Add a check in your application initialization
Use appropriate maxmemory setting - Set maxmemory on your parameter group based on your node type (e.g., 80% of available memory)
Monitor memory usage - Set up CloudWatch alarms for memory usage
Use job retention policies - Configure removeOnComplete and removeOnFail to prevent memory bloat
Duplicate connections for workers - Use connection.duplicate() for workers to avoid connection issues
Enable Redis persistence - Consider enabling AOF (Append Only File) for queue durability
Test failover scenarios - If using replicas, test that your application handles failover correctly

Quick Reference: Parameter Group Settings for BullMQ

Parameter	Recommended Value	Reason
`maxmemory-policy`	`noeviction`	Required - Prevents job loss
`maxmemory`	80% of node memory	Prevents OOM, leaves room for overhead
`timeout`	`300`	Close idle connections after 5 minutes
`tcp-keepalive`	`300`	Keep connections alive
`appendonly`	`yes` (optional)	Persistence for queue durability
`appendfsync`	`everysec` (optional)	Balance between performance and safety

Troubleshooting Common Issues

Error 1: ClusterAllFailedError: Failed to refresh slots cache

Full error message:

ClusterAllFailedError: Failed to refresh slots cache

Cause:
You're using new Redis.Cluster() with a Serverless or Node-Based (Cluster Mode Disabled) endpoint. These endpoints do not expose cluster topology information.

Why it happens:

Serverless endpoints are proxy-based and hide the internal cluster architecture
The Redis Cluster client tries to discover cluster nodes and shard slots
This discovery fails because the endpoint doesn't provide cluster topology

Solution:

Use the standard Redis client instead:

// ❌ WRONG - Don't use this with Serverless
const redis = new Redis.Cluster([
  { host: "my-cache.serverless.use1.cache.amazonaws.com", port: 6379 },
]);

// ✅ CORRECT - Use this instead
const redis = new Redis({
  host: "my-cache.serverless.use1.cache.amazonaws.com",
  port: 6379,
  tls: {},
});

Error 2: ETIMEDOUT - Connection Timeout

Full error message:

Error: connect ETIMEDOUT
  at TLSSocket.<anonymous>
  errorno: 'ETIMEDOUT',
  code: 'ETIMEDOUT',
  syscall: 'connect'

Cause:
Your application cannot reach ElastiCache over the network. This is always a networking/security group issue, not a code issue.

90% of the time, the cause is:

ElastiCache security group not allowing inbound traffic from your application
Application and ElastiCache in different VPCs
Missing subnet route configuration

Solution Steps:

Step 1: Verify VPC Configuration

Check that your application and ElastiCache are in the same VPC:

For EC2/ECS/Lambda:
- Go to EC2 Console → Select your instance
- Click Networking tab → Note the VPC ID
For ElastiCache:
- Go to ElastiCache Console → Select your cache
- Click Details tab → Note the VPC ID
Verify: Both VPC IDs must be identical

If VPCs are different: Connection will always fail. You need to either recreate the cache in the correct VPC or use VPC peering.

Step 2: Configure Security Group (Most Common Fix)

Configure ElastiCache Security Group:

Go to ElastiCache Console
Select your cache → Connectivity & security tab
Click on the Security group link
Click Edit inbound rules
Add a new rule:

Type Protocol Port Range Source

Custom TCP TCP 6379 Select Security Group → Choose your EC2/ECS/Lambda security group

Type	Protocol	Port Range	Source
Custom TCP	TCP	6379	Select Security Group → Choose your EC2/ECS/Lambda security group

Example:

Type: Custom TCP
Protocol: TCP
Port: 6379
Source: sg-0abc123def456 (your-app-security-group)
Description: Allow Redis traffic from application

Important: Use Security Group ID as the source, not IP addresses. This allows AWS to handle internal routing automatically.

Step 3: Verify Application Security Group (Outbound)

Go to EC2 Console → Security Groups
Select your application's security group
Click Outbound rules tab
Ensure there's a rule allowing outbound traffic:

Type Protocol Port Range Destination

All traffic All All 0.0.0.0/0

Type	Protocol	Port Range	Destination
All traffic	All	All	0.0.0.0/0

This is usually configured by default, but verify to be sure.

Step 4: Check Subnet Configuration

ElastiCache attaches to private subnets. Verify:

For ElastiCache:
- ElastiCache Console → Subnet groups
- Verify subnets have proper route tables
For your application:
- Must be in subnets that can route to ElastiCache subnets
- Usually automatic if in the same VPC

Step 5: Test Network Connectivity

SSH into your EC2 instance (or exec into your container) and test connectivity:

# Test with netcat (preferred)
nc -zv your-cache.serverless.use1.cache.amazonaws.com 6379

# Test with telnet
telnet your-cache.serverless.use1.cache.amazonaws.com 6379

Expected output:

Connection to your-cache.serverless.use1.cache.amazonaws.com 6379 port [tcp/*] succeeded!

If you see timeout:

Connection timed out

→ Security group or VPC configuration is still incorrect. Review steps 1-4.

If connection succeeds but your app still fails: → Check your TLS configuration in code (ensure tls: {} is set).

Error 3: Connection Refused

Error message:

Error: connect ECONNREFUSED

Causes:

Wrong hostname or port
ElastiCache is not in "Available" or "Active" status
Using localhost instead of actual endpoint

Solution:

Verify endpoint:
- Go to ElastiCache Console → Your cache
- Copy the exact endpoint from Connectivity & security tab
- Ensure you're using the correct port (default: 6379)
Check cache status:
- Cache must be in Available (Node-Based) or Active (Serverless) status
- If status is "Creating" or "Modifying", wait for it to complete

Don't use localhost:

// ❌ WRONG
host: "localhost";

// ✅ CORRECT
host: "my-cache.serverless.use1.cache.amazonaws.com";

Error 4: Cannot Access from Local Development Machine

Cause:
ElastiCache is private by default and only accessible from within the VPC.

Where ElastiCache works:

✅ EC2 instances in the same VPC
✅ ECS tasks in the same VPC
✅ Lambda functions in the same VPC
✅ Other AWS services in the same VPC

Where ElastiCache does NOT work:

❌ Your local development machine
❌ External servers outside AWS
❌ Different VPCs (without VPC peering/transit gateway)

Solutions for local development:

Option 1: Use SSH Tunnel (Recommended)

# Create SSH tunnel through bastion/EC2 instance
ssh -i your-key.pem -L 6379:your-cache.serverless.use1.cache.amazonaws.com:6379 ec2-user@your-ec2-ip

# Now connect to localhost in your application
const redis = new Redis({
  host: 'localhost',
  port: 6379,
  tls: {}, // Still required
});

Option 2: Use a Separate Development Cache

Create a separate ElastiCache instance with a different configuration for development, or use a local Redis instance.

Option 3: Deploy to EC2 for Testing

Deploy your application to an EC2 instance in the same VPC for testing.

Error 5: TLS Handshake Errors

Error message:

Error: unable to verify the first certificate
Error: TLS handshake failed

Cause:
Missing or incorrect TLS configuration.

Solution:

Always include tls: {} in your connection configuration:

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {}, // This is required for AWS ElastiCache
});

If you need to disable TLS verification (not recommended for production):

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {
    rejectUnauthorized: false, // Only for testing
  },
});

Error 6: BullMQ Jobs Disappearing or OOM Errors

Error messages:

OOM command not allowed when used memory > 'maxmemory'

Or jobs silently disappear from queues without processing.

Cause:
ElastiCache is using the default maxmemory-policy of volatile-lru or allkeys-lru, which evicts keys when memory is full. BullMQ requires the noeviction policy to ensure jobs are never lost.

Why it happens:

Default parameter groups use eviction policies designed for caching, not queuing
When Redis memory fills up, it evicts keys (including your job data)
BullMQ jobs are stored as Redis keys, so they can be evicted

Solution:

You must create a custom parameter group with maxmemory-policy set to noeviction. See the complete guide in the Configuring ElastiCache for BullMQ section above.

Quick fix steps:

Create custom parameter group with Redis family matching your cluster
Set maxmemory-policy to noeviction
Apply the parameter group to your cluster
Restart the cluster (required for changes to take effect)

Prevention:

Add this verification to your application startup:

import Redis from "ioredis";

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
});

// Verify on startup
const [, policy] = await redis.config("GET", "maxmemory-policy");
if (policy !== "noeviction") {
  console.error(
    '❌ CRITICAL: maxmemory-policy must be "noeviction" for BullMQ',
  );
  console.error(`Current policy: ${policy}`);
  process.exit(1);
}
console.log("✅ Redis configured correctly for BullMQ");

Important: This issue only affects Node-Based clusters. Serverless ElastiCache cannot be configured with noeviction and is not compatible with BullMQ.

Best Practices

1. Use Environment Variables

Never hardcode connection details:

// ✅ GOOD
const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: parseInt(process.env.REDIS_PORT || "6379"),
  tls: {},
});

// ❌ BAD
const redis = new Redis({
  host: "my-cache.serverless.use1.cache.amazonaws.com",
  port: 6379,
  tls: {},
});

2. Implement Connection Error Handling

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
  retryStrategy: (times) => {
    if (times > 3) {
      return null; // Stop retrying after 3 attempts
    }
    return Math.min(times * 200, 2000);
  },
  reconnectOnError: (err) => {
    const targetError = "READONLY";
    if (err.message.includes(targetError)) {
      return true; // Reconnect on specific errors
    }
    return false;
  },
});

redis.on("error", (err) => {
  console.error("Redis error:", err);
  // Send to error tracking service (Sentry, CloudWatch, etc.)
});

redis.on("connect", () => {
  console.log("Redis connected");
});

redis.on("close", () => {
  console.log("Redis connection closed");
});

3. Use Security Group References, Not IP Addresses

When configuring security groups:

✅ GOOD: Source = sg-xxxxx (security group ID)
❌ BAD: Source = 10.0.1.5/32 (IP address)

Security group referencing allows AWS to handle internal IP changes automatically.

4. Enable Encryption for Production

Always enable:

Encryption at rest (data stored on disk)
Encryption in transit (TLS)

This is configured during cache creation and cannot be changed after creation.

5. Use Multiple Availability Zones

For production environments:

Enable multi-AZ deployment
Use at least 1 replica node
Enables automatic failover

6. Monitor Your Cache

Set up CloudWatch alarms for:

CPUUtilization (alert if > 75%)
DatabaseMemoryUsagePercentage (alert if > 80%)
EngineCPUUtilization (alert if > 75%)
NetworkBytesIn/Out
CurrConnections

7. Implement Connection Pooling

Reuse Redis connections instead of creating new ones for each request:

// Create once at application startup
const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  tls: {},
  maxRetriesPerRequest: 3,
  enableReadyCheck: true,
  lazyConnect: false,
});

// Reuse throughout application
export default redis;

8. Use Appropriate TTLs

Set time-to-live (TTL) for cached data:

// Set with TTL (expires in 1 hour)
await redis.setex("key", 3600, "value");

// Set with TTL using SET command
await redis.set("key", "value", "EX", 3600);

9. Test Network Connectivity During Setup

Before deploying your application, verify connectivity from your compute environment (EC2/ECS/Lambda) to ElastiCache.

10. Document Your Configuration

Keep a record of:

VPC ID
Subnet group
Security groups
Node type
Cluster/Serverless configuration
Backup and maintenance windows

Important Notes

About VPC and Networking

ElastiCache is VPC-private by default and cannot be accessed from the internet
You cannot change the VPC after cache creation
All clients must be in the same VPC (or use VPC peering/transit gateway)
Security groups act as firewalls—configure them correctly

About Serverless vs Node-Based

Serverless is easier to manage but gives less control
Serverless cannot be used with BullMQ (incompatible maxmemory-policy configuration)
Node-Based is required for BullMQ and applications needing custom Redis parameters
You cannot convert between Serverless and Node-Based after creation

About Cluster Mode

Cluster Mode Disabled: Simpler, single endpoint, up to 5 read replicas
Cluster Mode Enabled: Better performance for large datasets, multiple shards, requires Redis Cluster client

About TLS/Encryption

TLS (in-transit encryption) is highly recommended for production
Once set, you cannot disable encryption without recreating the cache
Always use tls: {} in your Redis client configuration

About Costs

Serverless: Pay for data storage and ECPUs (processing units)
Node-Based: Pay for node hours based on instance type
Data transfer within the same AZ is free
Cross-AZ transfer incurs charges

About Backups

Backups are important for production workloads
Enable automatic snapshots
Backups impact performance slightly during snapshot creation

Additional Resources

Last Updated: March 2026
Author: Md Rakibul Islam

This guide is maintained based on actual deployment experiences and common issues encountered in production environments.

Command Palette

Table of Contents

Overview

Prerequisites

Understanding Deployment Options

Valkey Serverless

Valkey Node-Based Cluster

Creating Valkey Serverless Cache

Step 1: Access ElastiCache Console

Step 2: Configure Cache Settings

Step 3: Create and Wait

Step 4: Get Connection Endpoint

Creating Valkey Node-Based Cluster

Step 1: Access ElastiCache Console

Step 2: Select Deployment Option

Step 3: Configure Cluster Settings

Step 4: Configure Subnet Group

Step 5: Configure Security Settings

Step 6: Configure Backup and Maintenance (Optional)

Step 7: Review and Create

Step 8: Get Connection Endpoint

Connecting to ElastiCache

Understanding Connection Types

Connecting to Valkey Serverless

Connecting to Valkey Node-Based Cluster (Cluster Mode Disabled)

Connecting to Valkey Node-Based Cluster (Cluster Mode Enabled)

Using with BullMQ

Configuring ElastiCache for BullMQ

Why This Configuration Is Required

Common Error Without This Configuration

Step-by-Step Configuration Guide

Step 1: Create a Custom Parameter Group

Step 2: Modify the maxmemory-policy Parameter

Step 3: Apply the Custom Parameter Group to Your Cluster

Step 4: Restart Required (For Existing Clusters)

Step 5: Verify the Configuration

Complete BullMQ Setup Example

Best Practices for BullMQ with ElastiCache

Quick Reference: Parameter Group Settings for BullMQ

Troubleshooting Common Issues

Error 1: ClusterAllFailedError: Failed to refresh slots cache

Error 2: ETIMEDOUT - Connection Timeout

Solution Steps:

Step 1: Verify VPC Configuration

Step 2: Configure Security Group (Most Common Fix)

Step 3: Verify Application Security Group (Outbound)

Step 4: Check Subnet Configuration

Step 5: Test Network Connectivity

Error 3: Connection Refused

Error 4: Cannot Access from Local Development Machine

Error 5: TLS Handshake Errors

Error 6: BullMQ Jobs Disappearing or OOM Errors

Best Practices

1. Use Environment Variables

2. Implement Connection Error Handling

3. Use Security Group References, Not IP Addresses

4. Enable Encryption for Production

5. Use Multiple Availability Zones

6. Monitor Your Cache

7. Implement Connection Pooling

8. Use Appropriate TTLs

9. Test Network Connectivity During Setup

10. Document Your Configuration

Important Notes

About VPC and Networking

About Serverless vs Node-Based

About Cluster Mode

About TLS/Encryption

About Costs

About Backups

Additional Resources

Comments

More from this blog