AR.IO LogoAR.IO Documentation

Gateway Filters

Configure your AR.IO Gateway to efficiently process and index only the data you need. This comprehensive guide covers advanced filtering techniques, performance optimization, and real-world use cases.

Overview

The AR.IO Gateway uses a flexible JSON-based filtering system to control data processing and indexing. The system provides precise control over which bundles are processed and which data items are indexed for querying.

Understanding the Filtering System

The AR.IO Gateway uses two primary filters to control data processing:

  1. ANS104_UNBUNDLE_FILTER - Controls which bundles are processed and unbundled
  2. ANS104_INDEX_FILTER - Controls which data items from unbundled bundles are indexed for querying

By default, gateways process no bundles and index no data items. You must explicitly configure filters to start processing data.

Core Environment Variables

Configure Data Management

Optimize data storage and processing:

# Number of new data items before flushing to stable storage
DATA_ITEM_FLUSH_COUNT_THRESHOLD=1000

# Maximum time between flushes (in seconds)
MAX_FLUSH_INTERVAL_SECONDS=600

# Maximum number of data items to queue for indexing
MAX_DATA_ITEM_QUEUE_SIZE=100000

# Enable background verification
ENABLE_BACKGROUND_DATA_VERIFICATION=true

Set Up GraphQL Configuration

Choose between local-only or proxied queries:

# For new gateways - proxy to arweave.net for complete index
GRAPHQL_HOST=arweave.net
GRAPHQL_PORT=443

# For local-only queries (uncomment to use)
# GRAPHQL_HOST=

Filter Construction

.env formatting

While the filters below are displayed on multiple lines for readability, they must be stored in the .env file as a single line for proper processing.

Basic Filters

The simplest filters you can use are "always" and "never" filters. The "never" filter is the default behavior and will match nothing, while the "always" filter matches everything.

{
  "never": true //default behavior
}
{
  "always": true
}

Tag Filters

Tag filters allow you to match items based on their tags in three different ways. You can match exact tag values, check for the presence of a tag regardless of its value, or match tags whose values start with specific text. All tag values are automatically base64url-decoded before matching.

{
  "tags": [
    {
      "name": "Content-Type",
      "value": "image/jpeg"
    }
  ]
}
{
  "tags": [
    {
      "name": "App-Name"
    }
  ]
}
{
  "tags": [
    {
      "name": "Protocol",
      "valueStartsWith": "AO"
    }
  ]
}

Attribute Filters

Attribute filtering allows you to match items based on their metadata properties. The system automatically handles owner public key to address conversion, making it easy to filter by owner address. You can combine multiple attributes in a single filter:

{
  "attributes": {
    "owner_address": "xyz123...",
    "data_size": 1000
  }
}

Nested Bundle Filter

The isNestedBundle filter is a specialized filter that checks whether a data item is part of a nested bundle structure. It's particularly useful when you need to identify or process data items in bundles that are contained within other bundles.

{
  "isNestedBundle": true
}

Note: When processing nested bundles, be sure to include filters that match the nested bundles in both ANS104_UNBUNDLE_FILTER and ANS104_INDEX_FILTER. The bundle data items (nested bundles) need to be indexed to be matched by the unbundle filter.

Complex Filters Using Logical Operators

For more complex scenarios, the system provides logical operators (AND, OR, NOT) that can be combined to create sophisticated filtering patterns. These operators can be nested to any depth:

{
  "and": [
    {
      "tags": [
        {
          "name": "App-Name",
          "value": "ArDrive-App"
        }
      ]
    },
    {
      "tags": [
        {
          "name": "Content-Type",
          "valueStartsWith": "image/"
        }
      ]
    }
  ]
}
{
  "or": [
    {
      "tags": [
        {
          "name": "App-Name",
          "value": "ArDrive-App"
        }
      ]
    },
    {
      "attributes": {
        "data_size": 1000
      }
    }
  ]
}
{
  "not": {
    "tags": [
      {
        "name": "Content-Type",
        "value": "application/json"
      }
    ]
  }
}

Filter Configuration Strategies

Process Everything

{
  "always": true
}

Process Nothing (Default)

{
  "never": true
}

Process Specific App Data

{
  "tags": [
    {
      "name": "App-Name",
      "valueStartsWith": "MyApp"
    }
  ]
}

Single Application

{
  "tags": [
    {
      "name": "App-Name",
      "value": "MyApp-v1.0"
    }
  ]
}

Multiple Applications

{
  "or": [
    {
      "tags": [
        {
          "name": "App-Name",
          "value": "MyApp-v1.0"
        }
      ]
    },
    {
      "tags": [
        {
          "name": "App-Name",
          "value": "AnotherApp-v2.1"
        }
      ]
    }
  ]
}

Application with Version Range

{
  "tags": [
    {
      "name": "App-Name",
      "valueStartsWith": "MyApp"
    }
  ]
}

Content Type Filtering

{
  "tags": [
    {
      "name": "Content-Type",
      "valueStartsWith": "image/"
    }
  ]
}

Specific File Types

{
  "or": [
    {
      "tags": [
        {
          "name": "Content-Type",
          "value": "application/json"
        }
      ]
    },
    {
      "tags": [
        {
          "name": "Content-Type",
          "value": "text/plain"
        }
      ]
    }
  ]
}

File Size Filtering

{
  "attributes": {
    "data_size": 1000000
  }
}

Single Owner

{
  "attributes": {
    "owner_address": "YOUR_WALLET_ADDRESS"
  }
}

Multiple Owners

{
  "or": [
    {
      "attributes": {
        "owner_address": "WALLET_ADDRESS_1"
      }
    },
    {
      "attributes": {
        "owner_address": "WALLET_ADDRESS_2"
      }
    }
  ]
}

Exclude Specific Owners

{
  "not": {
    "attributes": {
      "owner_address": "UNWANTED_WALLET_ADDRESS"
    }
  }
}

Complex Multi-Condition Filter

{
  "and": [
    {
      "tags": [
        {
          "name": "App-Name",
          "valueStartsWith": "MyApp"
        }
      ]
    },
    {
      "attributes": {
        "owner_address": "YOUR_WALLET_ADDRESS"
      }
    },
    {
      "not": {
        "tags": [
          {
            "name": "Content-Type",
            "value": "application/octet-stream"
          }
        ]
      }
    }
  ]
}

Exclude Common Bundlers

{
  "and": [
    {
      "not": {
        "or": [
          {
            "tags": [
              {
                "name": "Bundler-App-Name",
                "value": "Warp"
              }
            ]
          },
          {
            "tags": [
              {
                "name": "Bundler-App-Name",
                "value": "Redstone"
              }
            ]
          },
          {
            "attributes": {
              "owner_address": "-OXcT1sVRSA5eGwt2k6Yuz8-3e3g9WJi5uSE99CWqsBs"
            }
          }
        ]
      }
    },
    {
      "tags": [
        {
          "name": "App-Name",
          "valueStartsWith": "MyApp"
        }
      ]
    }
  ]
}

Real-World Use Cases

Personal Data Gateway

Perfect for individuals who want to process only their own data:

Unbundle Filter:

{
  "and": [
    {
      "not": {
        "or": [
          {
            "tags": [
              {
                "name": "Bundler-App-Name",
                "value": "Warp"
              }
            ]
          },
          {
            "tags": [
              {
                "name": "Bundler-App-Name",
                "value": "Redstone"
              }
            ]
          }
        ]
      }
    },
    {
      "tags": [
        {
          "name": "App-Name",
          "valueStartsWith": "MyApp"
        }
      ]
    }
  ]
}

Index Filter:

{
  "attributes": {
    "owner_address": "YOUR_WALLET_ADDRESS"
  }
}

Application-Specific Service

Ideal for building services around specific applications:

Unbundle Filter:

{
  "tags": [
    {
      "name": "App-Name",
      "valueStartsWith": "MyApp"
    }
  ]
}

Index Filter:

{
  "or": [
    {
      "tags": [
        {
          "name": "ArFS",
          "value": "0.10"
        }
      ]
    },
    {
      "tags": [
        {
          "name": "ArFS",
          "value": "0.11"
        }
      ]
    },
    {
      "tags": [
        {
          "name": "ArFS",
          "value": "0.12"
        }
      ]
    }
  ]
}

Content-Type Focused Gateway

For gateways specializing in specific content types:

Unbundle Filter:

{
  "tags": [
    {
      "name": "Content-Type",
      "valueStartsWith": "image/"
    }
  ]
}

Index Filter:

{
  "and": [
    {
      "tags": [
        {
          "name": "Content-Type",
          "valueStartsWith": "image/"
        }
      ]
    },
    {
      "attributes": {
        "data_size": 100000
      }
    }
  ]
}

Performance Optimization

Worker Configuration

Understanding Default Worker Settings

The gateway uses sensible defaults that work well for most users:

# Default values (no need to set unless customizing)
# ANS104_UNBUNDLE_WORKERS=1 (default: 0, or 1 if filters are set)
# ANS104_DOWNLOAD_WORKERS=5 (default: 5)

# Only adjust if you have specific hardware requirements
# or want to optimize for your system's capabilities

When to Adjust Workers: Only modify worker counts if you have high-performance hardware and want to maximize throughput, or if you're experiencing resource constraints and need to reduce load.

Optimize Data Flushing

Balance between memory usage and database performance:

# For high-memory systems, increase threshold
DATA_ITEM_FLUSH_COUNT_THRESHOLD=2000

# For low-memory systems, decrease threshold
DATA_ITEM_FLUSH_COUNT_THRESHOLD=500

# Adjust flush interval based on data volume
MAX_FLUSH_INTERVAL_SECONDS=300

Enable Background Processing

# Enable background verification
ENABLE_BACKGROUND_DATA_VERIFICATION=true

# Enable WAL cleanup for better performance
ENABLE_DATA_DB_WAL_CLEANUP=true

Webhook Filters

There are also two filters available that are used to trigger webhooks. When a transaction is processed that matches one of the webhook filters, the gateway will send a webhook to the specified WEBHOOK_TARGET_SERVERS urls containing the transaction data.

WEBHOOK_INDEX_FILTER=""
WEBHOOK_BLOCK_FILTER=""

The WEBHOOK_INDEX_FILTER is used to trigger a webhook when a transaction is indexed. The WEBHOOK_BLOCK_FILTER is used to trigger a webhook when a block is processed.

Important Notes

  • All tag names and values are base64url-decoded before matching
  • Owner addresses are automatically converted from owner public keys
  • Empty or undefined filters default to "never match"
  • Tag matching requires all specified tags to match
  • Attribute matching requires all specified attributes to match
  • The filter system supports nested logical operations to any depth, allowing for very precise control over what data gets processed

Best Practices

Filter Design

  1. Start Simple - Begin with basic filters and gradually add complexity
  2. Test Thoroughly - Use FILTER_CHANGE_REPROCESS=true when changing filters
  3. Monitor Performance - Watch system resources during processing
  4. Document Changes - Keep track of filter modifications and their effects

Maintenance

  1. Regular Monitoring - Check gateway logs for errors and warnings
  2. Resource Cleanup - Periodically clean up old data and logs
  3. Filter Optimization - Refine filters based on actual data patterns
  4. Backup Configuration - Keep copies of working filter configurations

Troubleshooting

If your gateway stops processing data after changing filters, check: - Filter syntax is valid JSON - Required environment variables are set - Gateway has been restarted after changes - System has sufficient resources

Next Steps

Now that you understand gateway filtering, continue building your infrastructure:

How is this guide?