coderain blog

AWS S3: How to List Objects Filtered by Tags? If Not Possible, What Is the Purpose of S3 Object Tags?

Amazon S3 (Simple Storage Service) is the backbone of cloud storage for millions of users, offering scalability, durability, and flexibility for storing objects (files, images, backups, etc.). As S3 buckets grow in size—often containing millions of objects—organizing and managing these objects becomes critical. One common way to categorize objects is using S3 object tags: key-value pairs attached to objects to add metadata (e.g., Environment=Production, Project=Alpha, Archive=true).

A frequent question among S3 users is: Can I directly list objects in an S3 bucket filtered by their tags? If not, why bother using tags at all?

In this blog, we’ll answer these questions in detail. We’ll first explore whether S3 natively supports listing objects by tags, then dive into workarounds for achieving tag-based filtering. Finally, we’ll unpack the key purposes of S3 object tags—even if direct listing isn’t possible—to show why they’re an indispensable tool for S3 management.

2026-01

Table of Contents#

  1. What Are S3 Object Tags?
  2. Can You List S3 Objects Filtered by Tags Directly?
  3. Workarounds to List Objects by Tags
  4. The Real Purpose of S3 Object Tags
  5. Conclusion
  6. References

What Are S3 Object Tags?#

S3 object tags are user-defined key-value pairs (e.g., Key=Value) attached to individual S3 objects. They’re distinct from bucket tags (which apply to the entire bucket) and are stored as metadata alongside the object.

  • Limitations: Each object can have up to 10 tags. Tag keys are case-sensitive, and values can be empty.
  • Use Cases: Tags are not just for organization—they power critical S3 features like access control, cost tracking, and lifecycle management (more on this later).

Can You List S3 Objects Filtered by Tags Directly?#

Short Answer: No.

As of 2023, AWS S3 does not provide a native API or console feature to directly list objects in a bucket filtered by their tags. The primary S3 listing APIs—ListObjectsV2 and ListObjects—return objects based on prefixes, delimiters, or max keys, but they do not support tag-based filtering.

To verify this, check the AWS S3 API Reference: the ListObjectsV2 request parameters include Prefix, Delimiter, and MaxKeys, but no TagFilters or similar parameter.

Workarounds to List Objects by Tags#

While direct tag-based listing isn’t supported, you can use workarounds to achieve tag-filtered object lists. These methods involve fetching object metadata (including tags) and then filtering locally or via external tools. Here are the most common approaches:

Option 1: Use S3 Inventory + Amazon Athena#

S3 Inventory is a built-in feature that generates daily or weekly reports of your bucket’s objects (and their metadata, including tags) in CSV, Parquet, or ORC format. You can then query these reports using Amazon Athena (a serverless SQL query service) to filter objects by tags.

Step-by-Step Workflow:#

  1. Enable S3 Inventory for Your Bucket:

    • Go to the S3 Console → Select your bucket → ManagementInventory configurationsCreate inventory configuration.
    • Name the inventory (e.g., tag-inventory).
    • Under Destination, specify a bucket to store the inventory report (can be the same bucket or a different one).
    • Under Included object versions, choose Current version (or All versions if needed).
    • Under Optional fields, check Tags to include object tags in the report.
    • Save the configuration.
  2. Wait for the Inventory Report:
    S3 generates the first report within 48 hours (subsequent reports are daily/weekly as configured). The report will be stored in your destination bucket at a path like:

    s3://<destination-bucket>/<inventory-name>/YYYY-MM-DDTHH-MMZ/  
    
  3. Query the Report with Athena:

    • Go to the Athena Console → Query Editor.
    • Create a database (e.g., s3_inventory_db).
    • Create a table linked to your inventory CSV using a CREATE TABLE statement. Example for CSV:
      CREATE EXTERNAL TABLE IF NOT EXISTS s3_inventory_db.object_tags (  
        bucket string,  
        key string,  
        version_id string,  
        is_latest boolean,  
        is_delete_marker boolean,  
        size bigint,  
        last_modified_date string,  
        e_tag string,  
        storage_class string,  
        tags array<struct<key:string,value:string>>  
      )  
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
      WITH SERDEPROPERTIES (  
        'serialization.format' = ',',  
        'field.delim' = ','  
      )  
      LOCATION 's3://<destination-bucket>/<inventory-name>/';  
    • Run a query to filter objects by tag. For example, find all objects with Environment=Production:
      SELECT key, tags  
      FROM s3_inventory_db.object_tags  
      CROSS JOIN UNNEST(tags) AS t(tag)  
      WHERE tag.key = 'Environment' AND tag.value = 'Production';  

Pros: Scalable for large buckets (millions of objects); automated reports.
Cons: Not real-time (reports are daily/weekly); requires setup of Athena and S3 Inventory.

Option 2: AWS CLI + Tag Enumeration#

If you need a real-time (but slower) solution, use the AWS CLI to list all objects in the bucket, then fetch tags for each object and filter locally with tools like jq (JSON processor).

Step-by-Step Workflow:#

  1. List All Objects in the Bucket:
    Use aws s3api list-objects-v2 to get object keys:

    aws s3api list-objects-v2 --bucket my-bucket --query 'Contents[].Key' --output json > all_objects.json  
  2. Fetch Tags for Each Object and Filter:
    Loop through the object keys, fetch tags with aws s3api get-object-tagging, and filter for your target tag. Use jq to parse results:

    # Example: Filter objects with tag "Project=Alpha"  
    while IFS= read -r key; do  
      tags=$(aws s3api get-object-tagging --bucket my-bucket --key "$key" --query 'TagSet[]' --output json)  
      if echo "$tags" | jq -e '.[] | select(.Key == "Project" and .Value == "Alpha")' > /dev/null; then  
        echo "Object with tag: $key"  
      fi  
    done < <(jq -r '.[]' all_objects.json)  

Pros: Real-time; no external services needed.
Cons: Slow for large buckets (fetches tags one object at a time); high API call costs (each get-object-tagging is a request).

Option 3: SDK Scripting (Python, JavaScript, etc.)#

For programmatic control, use AWS SDKs (e.g., Boto3 for Python) to automate the CLI workflow at scale. You can parallelize tag fetching to speed up the process.

Python Example with Boto3:#

import boto3  
import concurrent.futures  
 
s3 = boto3.client('s3')  
bucket_name = 'my-bucket'  
target_tag_key = 'Environment'  
target_tag_value = 'Production'  
 
# Step 1: List all object keys  
objects = s3.list_objects_v2(Bucket=bucket_name)['Contents']  
object_keys = [obj['Key'] for obj in objects]  
 
# Step 2: Fetch tags for each object (parallelized)  
def get_tags(key):  
    try:  
        response = s3.get_object_tagging(Bucket=bucket_name, Key=key)  
        tags = {tag['Key']: tag['Value'] for tag in response.get('TagSet', [])}  
        return (key, tags)  
    except Exception as e:  
        print(f"Error fetching tags for {key}: {e}")  
        return (key, {})  
 
# Use ThreadPoolExecutor to speed up tag fetching  
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:  
    results = executor.map(get_tags, object_keys)  
 
# Step 3: Filter objects with the target tag  
filtered_objects = [key for key, tags in results if tags.get(target_tag_key) == target_tag_value]  
 
print(f"Objects with {target_tag_key}={target_tag_value}: {filtered_objects}")  

Pros: Customizable; parallelization improves speed.
Cons: Still slow for buckets with 100k+ objects; requires coding.

The Real Purpose of S3 Object Tags#

While direct tag-based listing isn’t supported, S3 object tags are far from useless. They enable powerful workflows for access control, cost management, automation, and compliance. Here are their core use cases:

1. Access Control with Bucket Policies#

S3 bucket policies use condition keys to restrict access to objects based on tags. For example, you can allow read access only to objects tagged Environment=Production or deny deletion of objects tagged Archive=true.

Example: Deny Deletion of Archived Objects#

{  
  "Version": "2012-10-17",  
  "Statement": [  
    {  
      "Sid": "DenyDeleteArchivedObjects",  
      "Effect": "Deny",  
      "Principal": "*",  
      "Action": "s3:DeleteObject",  
      "Resource": "arn:aws:s3:::my-bucket/*",  
      "Condition": {  
        "StringEquals": {  
          "s3:ExistingObjectTag/Archive": "true"  
        }  
      }  
    }  
  ]  
}  

2. Cost Allocation and Tracking#

AWS Cost Explorer and AWS Cost and Usage Report (CUR) let you track storage costs by tags. This is critical for chargeback (e.g., billing departments based on Department=Finance tags) or identifying expensive projects.

Setup Steps:#

  1. Go to the AWS Billing Console → Cost Allocation Tags.
  2. Activate the tag key (e.g., Project) for cost allocation.
  3. In Cost Explorer, filter costs by Tag:Project=Alpha to see spending for that project.

3. Lifecycle Policies#

S3 lifecycle policies automate object transitions (e.g., to S3 Glacier) or expiration (deletion) based on tags. For example, transition objects tagged Archive=true to Glacier after 30 days to reduce storage costs.

Example: Lifecycle Policy for Archived Objects#

{  
  "Rules": [  
    {  
      "ID": "ArchiveTaggedObjects",  
      "Status": "Enabled",  
      "Prefix": "",  
      "TagFilters": [  
        {  
          "Key": "Archive",  
          "Value": "true"  
        }  
      ],  
      "Transition": {  
        "Days": 30,  
        "StorageClass": "GLACIER"  
      }  
    }  
  ]  
}  

4. Compliance and Auditing#

Tags simplify compliance by marking objects as sensitive (e.g., PII=true) or compliant (e.g., HIPAA=Compliant). You can use AWS CloudTrail to log tag changes (e.g., who added/removed a PII tag) for audit trails.

5. Automation and Event-Driven Workflows#

Tags trigger automation via AWS Lambda. For example:

  • When an object with Process=true is uploaded, trigger a Lambda to process the file (e.g., resize images).
  • Use Amazon EventBridge to monitor tag changes and alert admins if a Production tag is removed.

Conclusion#

While AWS S3 does not natively support direct listing of objects filtered by tags, workarounds like S3 Inventory + Athena or CLI/SDK scripting can achieve this (with tradeoffs in speed and complexity).

More importantly, S3 object tags are not just for organization—they are a foundational tool for access control, cost management, lifecycle automation, compliance, and event-driven workflows. Their value lies in enabling scalable, policy-based governance of your S3 data.

So, even without direct tag-based listing, tags remain indispensable for managing S3 at scale.

References#