coderain blog

How to Fix AWS S3 Event Replacing Spaces with '+' in Object Key Names Without Breaking Actual '+' Characters

Amazon S3 (Simple Storage Service) is a cornerstone of cloud storage, widely used for hosting files, backups, and data lakes. A common challenge developers face when working with S3 event notifications (e.g., SQS, SNS, or Lambda triggers) is handling object key names containing spaces or special characters like +. Specifically, S3 events may encode spaces as + in object keys, which can conflict with actual + characters in key names. If not handled correctly, this ambiguity can lead to misinterpreted key names, broken workflows, or data loss.

This blog post dives into why this issue occurs, its impact, and provides a step-by-step solution to decode S3 event keys correctly—preserving both spaces (encoded as +) and actual + characters.

2026-01

Table of Contents#

  1. Understanding the Problem
  2. Why Does This Happen?
  3. Impact of the Issue
  4. Step-by-Step Solution
  5. Testing the Fix
  6. Best Practices
  7. Conclusion
  8. References

1. Understanding the Problem#

When objects are uploaded to S3 with spaces in their names (e.g., my document.txt), S3 event notifications (e.g., when a file is created or modified) may encode these spaces as + characters in the object key. For example:

  • Original key: my document.txt → Encoded in S3 event: my+document.txt.

The confusion arises when object keys intentionally contain + characters (e.g., my+file.txt). In such cases, the S3 event may also represent the key as my+file.txt, making it impossible to distinguish between:

  • A space encoded as + (e.g., my document.txtmy+document.txt).
  • An actual + in the key (e.g., my+file.txtmy+file.txt).

If your application naively replaces all + characters with spaces to "fix" the encoded spaces, it will corrupt keys with actual + characters (e.g., my+file.txt becomes my file.txt).

2. Why Does This Happen?#

S3 uses URL encoding for object keys in event notifications to handle special characters safely. URL encoding converts non-alphanumeric characters into a %xx format (e.g., %20 for spaces, %2B for +). However, confusion arises from two common encoding standards:

  • Standard URL Encoding (application/x-www-form-urlencoded): Historically used in HTML forms, this replaces spaces with + and other special characters with %xx (e.g., +%2B).
  • Modern URL Encoding: Replaces spaces with %20 and + with %2B, avoiding ambiguity.

S3 event notifications use standard URL encoding for backward compatibility. This means:

  • Spaces in object keys are encoded as +.
  • Actual + characters in keys are encoded as %2B (not +).

The root of the problem is often incorrect decoding logic (e.g., using unquote_plus instead of unquote in Python) that fails to account for this encoding distinction.

3. Impact of the Issue#

Misinterpreting encoded keys can lead to critical issues:

  • Data Loss: If your application tries to access my file.txt (decoded from my+file.txt) instead of the actual key my+file.txt, it will fail to find the object.
  • Corrupted Workflows: Automated pipelines (e.g., data processing, backups) may process the wrong files or fail entirely.
  • Debugging Nightmares: Ambiguous key names make it hard to trace why files are missing or operations fail.

4. Step-by-Step Solution#

To resolve this, we need to decode S3 event keys using logic that:

  • Replaces + with spaces (for encoded spaces).
  • Decodes %2B to + (for actual + characters in keys).

Step 1: Confirm S3 Event Encoding#

First, verify how S3 encodes spaces and + in your event notifications:

  1. Upload Test Objects:

    • Create two test files:
      • test space.txt (with a space).
      • test+plus.txt (with an actual +).
    • Upload them to your S3 bucket.
  2. Inspect the Raw S3 Event:

    • Use tools like the AWS Console (SQS queue, CloudWatch Logs for Lambda) to view the raw event payload.
    • You should see:
      • For test space.txt: Key encoded as test+space.txt.
      • For test+plus.txt: Key encoded as test%2Bplus.txt.

    Example raw event snippet (truncated for clarity):

    {  
      "Records": [  
        {  
          "s3": {  
            "object": {  
              "key": "test+space.txt"  // Space encoded as '+'  
            }  
          }  
        },  
        {  
          "s3": {  
            "object": {  
              "key": "test%2Bplus.txt"  // '+' encoded as '%2B'  
            }  
          }  
        }  
      ]  
    }  

Step 2: Use Proper Decoding Logic#

To decode the key correctly, use a function that:

  1. Replaces + with spaces (to fix encoded spaces).
  2. Decodes %xx sequences (to fix encoded + and other characters).

Example Implementations#

Python#

Use urllib.parse.unquote_plus, which handles both + → space and %xx → character:

from urllib.parse import unquote_plus  
 
encoded_key = "test+space.txt"  
decoded_key = unquote_plus(encoded_key)  
print(decoded_key)  # Output: "test space.txt"  
 
encoded_key = "test%2Bplus.txt"  
decoded_key = unquote_plus(encoded_key)  
print(decoded_key)  # Output: "test+plus.txt"  
JavaScript (Node.js)#

Use decodeURIComponent with a preprocessing step to replace + with spaces:

function decodeS3Key(encodedKey) {  
  return decodeURIComponent(encodedKey.replace(/\+/g, ' '));  
}  
 
const encodedKey1 = "test+space.txt";  
console.log(decodeS3Key(encodedKey1));  // Output: "test space.txt"  
 
const encodedKey2 = "test%2Bplus.txt";  
console.log(decodeS3Key(encodedKey2));  // Output: "test+plus.txt"  
Java#

Use URLDecoder.decode with UTF-8 encoding (handles + → space and %xx → character):

import java.net.URLDecoder;  
 
public class S3KeyDecoder {  
  public static void main(String[] args) throws Exception {  
    String encodedKey1 = "test+space.txt";  
    String decodedKey1 = URLDecoder.decode(encodedKey1, "UTF-8");  
    System.out.println(decodedKey1);  // Output: "test space.txt"  
 
    String encodedKey2 = "test%2Bplus.txt";  
    String decodedKey2 = URLDecoder.decode(encodedKey2, "UTF-8");  
    System.out.println(decodedKey2);  // Output: "test+plus.txt"  
  }  
}  

Step 3: Integrate Decoding into Your Workflow#

Update your application code to use the decoding logic above when processing S3 events. For example, in a Lambda function triggered by S3:

import json  
from urllib.parse import unquote_plus  
 
def lambda_handler(event, context):  
  for record in event['Records']:  
    encoded_key = record['s3']['object']['key']  
    decoded_key = unquote_plus(encoded_key)  
    print(f"Decoded key: {decoded_key}")  
    # Use decoded_key to process the object (e.g., download from S3)  

5. Testing the Fix#

Validate the solution with these test cases:

Test CaseEncoded Key in EventExpected Decoded KeyActual Decoded Key (with fix)
Space in keytest+space.txttest space.txttest space.txt
+ in keytest%2Bplus.txttest+plus.txttest+plus.txt
Mixed (space and +)test+with%2Bplus.txttest with+plus.txttest with+plus.txt

If all test cases pass, your decoding logic is correct.

6. Best Practices#

  • Always Decode Keys: Never use the raw encoded key from S3 events—decode it first.
  • Log Encoded/Decoded Keys: For debugging, log both the raw encoded key and the decoded result.
  • Test Edge Cases: Validate with keys containing spaces, +, %, and other special characters (e.g., test%20and%2Bplus.txt).
  • Use SDKs: Leverage AWS SDKs (e.g., Boto3 for Python) which often handle decoding automatically in higher-level APIs.

7. Conclusion#

S3 event notifications encode spaces as + and + as %2B, leading to ambiguity if decoded incorrectly. By using proper decoding functions (e.g., unquote_plus in Python, URLDecoder.decode in Java), you can preserve actual + characters while fixing encoded spaces. Always validate with test cases to ensure your solution works for both scenarios.

8. References#