coderain blog

How to Use AWS DynamoDB Scan with FilterExpression for Array of Hash Values (JavaScript SDK Example)

Amazon DynamoDB is a fully managed NoSQL database service known for its scalability, high performance, and low latency. When working with DynamoDB, developers often need to retrieve specific items from a table. While the Query operation is ideal for fetching items using a known partition key (hash key), there are scenarios where you may need to retrieve items based on a list of partition keys (e.g., fetching user profiles for a list of user IDs).

In such cases, the Scan operation combined with a FilterExpression can be used to filter results based on an array of hash values. However, Scan is a full-table scan (or index scan) and should be used judiciously due to cost and performance considerations.

This blog post will guide you through using Scan with FilterExpression to retrieve items where the partition key exists in an array of values, using the AWS SDK for JavaScript (v3). We’ll cover setup, implementation, pagination, best practices, and troubleshooting.

2026-01

Table of Contents#

  1. Prerequisites
  2. Understanding DynamoDB Scan and FilterExpression
  3. Setting Up the DynamoDB Table
  4. Example Scenario
  5. Implementing the Scan with FilterExpression (JavaScript SDK v3)
  6. Step-by-Step Explanation
  7. Handling Large Result Sets (Pagination)
  8. Best Practices
  9. Troubleshooting Common Issues
  10. Conclusion
  11. References

Prerequisites#

Before getting started, ensure you have the following:

  • An AWS account with access to DynamoDB.
  • AWS CLI configured with appropriate permissions (or an IAM role with dynamodb:Scan permissions).
  • Node.js (v14+ recommended) installed.
  • AWS SDK for JavaScript v3 (@aws-sdk/client-dynamodb package).
  • Basic familiarity with DynamoDB concepts (partition keys, items, attributes) and JavaScript.

Understanding DynamoDB Scan and FilterExpression#

What is Scan?#

The Scan operation reads every item in a DynamoDB table (or index) and returns results that match optional filter criteria. Unlike Query, which requires a specific partition key, Scan iterates over the entire dataset. This makes it less efficient for large tables but useful when you need to filter across multiple partition keys.

What is FilterExpression?#

FilterExpression is an optional parameter for Scan that allows you to filter results after the scan is performed. It uses conditions to reduce the number of items returned to your application (though it does not reduce the amount of data scanned). For our use case, we’ll use the IN operator to check if the partition key exists in an array of values.

Key Considerations:#

  • Cost: Scan consumes read capacity units (RCUs) based on the total size of the data scanned (not the filtered result). For large tables, this can be expensive.
  • Performance: Scan is slower than Query for large datasets, as it scans the entire table.
  • Filtering: FilterExpression is applied post-scan, so it does not reduce the amount of data processed by DynamoDB (only the data returned to you).

Setting Up the DynamoDB Table#

To follow along, let’s create a sample DynamoDB table and populate it with test data. We’ll use a Users table with:

  • Table Name: Users
  • Partition Key: userId (string)
  • Attributes: name (string), email (string), status (string)

Step 1: Create the Table (via AWS CLI)#

Run this command in your terminal to create the table:

aws dynamodb create-table \
  --table-name Users \
  --attribute-definitions AttributeName=userId,AttributeType=S \
  --key-schema AttributeName=userId,KeyType=HASH \
  --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

Step 2: Insert Sample Items#

Add a few test items using the AWS CLI or DynamoDB Console:

# Add user123
aws dynamodb put-item \
  --table-name Users \
  --item '{"userId": {"S": "user123"}, "name": {"S": "Alice Smith"}, "email": {"S": "[email protected]"}, "status": {"S": "active"}}'
 
# Add user456
aws dynamodb put-item \
  --table-name Users \
  --item '{"userId": {"S": "user456"}, "name": {"S": "Bob Johnson"}, "email": {"S": "[email protected]"}, "status": {"S": "inactive"}}'
 
# Add user789
aws dynamodb put-item \
  --table-name Users \
  --item '{"userId": {"S": "user789"}, "name": {"S": "Charlie Brown"}, "email": {"S": "[email protected]"}, "status": {"S": "active"}}'
 
# Add user000 (to test filtering)
aws dynamodb put-item \
  --table-name Users \
  --item '{"userId": {"S": "user000"}, "name": {"S": "Diana Prince"}, "email": {"S": "[email protected]"}, "status": {"S": "active"}}'

Example Scenario#

Suppose we need to retrieve user data for a list of userId values: ["user123", "user456", "user789"]. We’ll use Scan with FilterExpression to fetch only these users.

Implementing the Scan with FilterExpression (JavaScript SDK v3)#

We’ll use the AWS SDK for JavaScript v3 to implement the Scan operation. First, install the required package:

npm install @aws-sdk/client-dynamodb

Full Code Example#

import { DynamoDBClient, ScanCommand } from "@aws-sdk/client-dynamodb";
 
// Initialize DynamoDB client
const client = new DynamoDBClient({ region: "us-east-1" }); // Replace with your region
 
// Array of user IDs to filter
const targetUserIds = ["user123", "user456", "user789"];
 
const scanItems = async () => {
  try {
    // Build FilterExpression and ExpressionAttributeValues dynamically
    const placeholders = targetUserIds.map((_, index) => `:id${index}`).join(", ");
    const filterExpression = `userId IN (${placeholders})`;
 
    const expressionAttributeValues = targetUserIds.reduce((acc, id, index) => {
      acc[`:id${index}`] = { S: id }; // "S" denotes string type
      return acc;
    }, {});
 
    // Define Scan parameters
    const params = {
      TableName: "Users",
      FilterExpression: filterExpression,
      ExpressionAttributeValues: expressionAttributeValues,
      // Optional: Limit returned attributes with ProjectionExpression
      // ProjectionExpression: "userId, name, email"
    };
 
    // Execute Scan
    const command = new ScanCommand(params);
    const response = await client.send(command);
 
    // Log results (convert DynamoDB JSON to regular JSON)
    const items = response.Items.map((item) => {
      return Object.entries(item).reduce((acc, [key, value]) => {
        acc[key] = Object.values(value)[0]; // Extract value (e.g., { S: "user123" } → "user123")
        return acc;
      }, {});
    });
 
    console.log("Filtered Users:", items);
    return items;
 
  } catch (error) {
    console.error("Error scanning items:", error);
    throw error;
  }
};
 
// Run the function
scanItems();

Step-by-Step Explanation#

1. Initialize the DynamoDB Client#

We start by initializing a DynamoDBClient with our AWS region (e.g., us-east-1).

2. Define the Target Array#

We specify the array of userId values we want to filter: ["user123", "user456", "user789"].

3. Build FilterExpression Dynamically#

Since the array length may vary, we dynamically generate the FilterExpression using placeholders (e.g., :id0, :id1). For targetUserIds = ["user123", "user456", "user789"], the FilterExpression becomes:

userId IN (:id0, :id1, :id2)

4. Define ExpressionAttributeValues#

We map each placeholder (e.g., :id0) to its corresponding userId value using ExpressionAttributeValues. This avoids SQL injection-like issues and ensures type safety (e.g., { S: "user123" } specifies a string type).

5. Execute the Scan#

We create a ScanCommand with the parameters and send it via the client. The response includes Items matching the filter.

6. Process Results#

DynamoDB returns items in a special JSON format (e.g., { userId: { S: "user123" } }). We convert this to regular JSON for readability.

Expected Output#

Filtered Users: [
  {
    "userId": "user123",
    "name": "Alice Smith",
    "email": "[email protected]",
    "status": "active"
  },
  {
    "userId": "user456",
    "name": "Bob Johnson",
    "email": "[email protected]",
    "status": "inactive"
  },
  {
    "userId": "user789",
    "name": "Charlie Brown",
    "email": "[email protected]",
    "status": "active"
  }
]

Handling Large Result Sets (Pagination)#

Scan returns a maximum of 1MB of data per response. If your filtered results exceed this, DynamoDB includes a LastEvaluatedKey in the response. Use this key to paginate through remaining items:

const scanAllItems = async () => {
  const allItems = [];
  let lastEvaluatedKey = null;
 
  do {
    const placeholders = targetUserIds.map((_, index) => `:id${index}`).join(", ");
    const filterExpression = `userId IN (${placeholders})`;
    const expressionAttributeValues = targetUserIds.reduce((acc, id, index) => {
      acc[`:id${index}`] = { S: id };
      return acc;
    }, {});
 
    const params = {
      TableName: "Users",
      FilterExpression: filterExpression,
      ExpressionAttributeValues: expressionAttributeValues,
      ExclusiveStartKey: lastEvaluatedKey, // For pagination
    };
 
    const command = new ScanCommand(params);
    const response = await client.send(command);
 
    // Process and accumulate items
    const items = response.Items.map((item) => {
      return Object.entries(item).reduce((acc, [key, value]) => {
        acc[key] = Object.values(value)[0];
        return acc;
      }, {});
    });
 
    allItems.push(...items);
    lastEvaluatedKey = response.LastEvaluatedKey;
 
  } while (lastEvaluatedKey);
 
  console.log("All Filtered Users:", allItems);
  return allItems;
};

Best Practices#

  1. Prefer Query Over Scan: If possible, use Query with a known partition key (or index) instead of Scan, as it is more efficient. For multiple partition keys, consider parallel Query operations (one per key) and aggregate results.

  2. Limit Scanned Data: Use ProjectionExpression to fetch only required attributes (reduces data transfer). Example: ProjectionExpression: "userId, name, email".

  3. Avoid Large Scans: For tables with millions of items, Scan can be slow and costly. Use parallel scans (via TotalSegments and Segment parameters) for very large datasets.

  4. Use Indexes: If filtering on non-key attributes, create a global secondary index (GSI) and Scan the index instead of the base table.

  5. Monitor Costs: Scan consumes RCUs based on the data scanned. Use AWS Cost Explorer to track usage.

Troubleshooting Common Issues#

  • Syntax Errors in FilterExpression: Ensure commas and parentheses are correctly placed. Use dynamic placeholder generation (as shown) to avoid typos.

  • Incorrect Attribute Types: Mismatched data types (e.g., using N for a string) cause validation errors. Use S for strings, N for numbers, etc.

  • Missing Permissions: The IAM role must have dynamodb:Scan permission on the table. Example policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "dynamodb:Scan",
          "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/Users"
        }
      ]
    }
  • Large Result Sets: If results are truncated, use pagination with LastEvaluatedKey.

Conclusion#

Using Scan with FilterExpression to filter by an array of partition keys is a viable solution when you need to retrieve items for multiple known IDs. However, always prioritize Query for better performance and cost-efficiency. When using Scan, follow best practices like limiting attributes, paginating results, and monitoring RCU usage.

With the JavaScript SDK v3 example provided, you can dynamically filter items based on an array of hash values and handle large datasets with pagination.

References#