Gateway Filters
Configure your AR.IO Gateway to efficiently process and index only the data you need. This comprehensive guide covers advanced filtering techniques, performance optimization, and real-world use cases.
Overview
The AR.IO Gateway uses a flexible JSON-based filtering system to control data processing and indexing. The system provides precise control over which bundles are processed and which data items are indexed for querying.
Understanding the Filtering System
The AR.IO Gateway uses two primary filters to control data processing:
- ANS104_UNBUNDLE_FILTER - Controls which bundles are processed and unbundled
- ANS104_INDEX_FILTER - Controls which data items from unbundled bundles are indexed for querying
By default, gateways process no bundles and index no data items. You must explicitly configure filters to start processing data.
Core Environment Variables
Configure Data Management
Optimize data storage and processing:
# Number of new data items before flushing to stable storage
DATA_ITEM_FLUSH_COUNT_THRESHOLD=1000
# Maximum time between flushes (in seconds)
MAX_FLUSH_INTERVAL_SECONDS=600
# Maximum number of data items to queue for indexing
MAX_DATA_ITEM_QUEUE_SIZE=100000
# Enable background verification
ENABLE_BACKGROUND_DATA_VERIFICATION=true
Set Up GraphQL Configuration
Choose between local-only or proxied queries:
# For new gateways - proxy to arweave.net for complete index
GRAPHQL_HOST=arweave.net
GRAPHQL_PORT=443
# For local-only queries (uncomment to use)
# GRAPHQL_HOST=
Filter Construction
.env formatting
While the filters below are displayed on multiple lines for readability, they
must be stored in the .env
file as a single line for proper processing.
Basic Filters
The simplest filters you can use are "always"
and "never"
filters. The "never"
filter is the default behavior and will match nothing, while the "always"
filter matches everything.
{
"never": true //default behavior
}
{
"always": true
}
Tag Filters
Tag filters allow you to match items based on their tags in three different ways. You can match exact tag values, check for the presence of a tag regardless of its value, or match tags whose values start with specific text. All tag values are automatically base64url-decoded before matching.
{
"tags": [
{
"name": "Content-Type",
"value": "image/jpeg"
}
]
}
{
"tags": [
{
"name": "App-Name"
}
]
}
{
"tags": [
{
"name": "Protocol",
"valueStartsWith": "AO"
}
]
}
Attribute Filters
Attribute filtering allows you to match items based on their metadata properties. The system automatically handles owner public key to address conversion, making it easy to filter by owner address. You can combine multiple attributes in a single filter:
{
"attributes": {
"owner_address": "xyz123...",
"data_size": 1000
}
}
Nested Bundle Filter
The isNestedBundle
filter is a specialized filter that checks whether a data item is part of a nested bundle structure. It's particularly useful when you need to identify or process data items in bundles that are contained within other bundles.
{
"isNestedBundle": true
}
Note: When processing nested bundles, be sure to include filters that match the nested bundles in both ANS104_UNBUNDLE_FILTER
and ANS104_INDEX_FILTER
. The bundle data items (nested bundles) need to be indexed to be matched by the unbundle filter.
Complex Filters Using Logical Operators
For more complex scenarios, the system provides logical operators (AND, OR, NOT) that can be combined to create sophisticated filtering patterns. These operators can be nested to any depth:
{
"and": [
{
"tags": [
{
"name": "App-Name",
"value": "ArDrive-App"
}
]
},
{
"tags": [
{
"name": "Content-Type",
"valueStartsWith": "image/"
}
]
}
]
}
{
"or": [
{
"tags": [
{
"name": "App-Name",
"value": "ArDrive-App"
}
]
},
{
"attributes": {
"data_size": 1000
}
}
]
}
{
"not": {
"tags": [
{
"name": "Content-Type",
"value": "application/json"
}
]
}
}
Filter Configuration Strategies
Process Everything
{
"always": true
}
Process Nothing (Default)
{
"never": true
}
Process Specific App Data
{
"tags": [
{
"name": "App-Name",
"valueStartsWith": "MyApp"
}
]
}
Single Application
{
"tags": [
{
"name": "App-Name",
"value": "MyApp-v1.0"
}
]
}
Multiple Applications
{
"or": [
{
"tags": [
{
"name": "App-Name",
"value": "MyApp-v1.0"
}
]
},
{
"tags": [
{
"name": "App-Name",
"value": "AnotherApp-v2.1"
}
]
}
]
}
Application with Version Range
{
"tags": [
{
"name": "App-Name",
"valueStartsWith": "MyApp"
}
]
}
Content Type Filtering
{
"tags": [
{
"name": "Content-Type",
"valueStartsWith": "image/"
}
]
}
Specific File Types
{
"or": [
{
"tags": [
{
"name": "Content-Type",
"value": "application/json"
}
]
},
{
"tags": [
{
"name": "Content-Type",
"value": "text/plain"
}
]
}
]
}
File Size Filtering
{
"attributes": {
"data_size": 1000000
}
}
Single Owner
{
"attributes": {
"owner_address": "YOUR_WALLET_ADDRESS"
}
}
Multiple Owners
{
"or": [
{
"attributes": {
"owner_address": "WALLET_ADDRESS_1"
}
},
{
"attributes": {
"owner_address": "WALLET_ADDRESS_2"
}
}
]
}
Exclude Specific Owners
{
"not": {
"attributes": {
"owner_address": "UNWANTED_WALLET_ADDRESS"
}
}
}
Complex Multi-Condition Filter
{
"and": [
{
"tags": [
{
"name": "App-Name",
"valueStartsWith": "MyApp"
}
]
},
{
"attributes": {
"owner_address": "YOUR_WALLET_ADDRESS"
}
},
{
"not": {
"tags": [
{
"name": "Content-Type",
"value": "application/octet-stream"
}
]
}
}
]
}
Exclude Common Bundlers
{
"and": [
{
"not": {
"or": [
{
"tags": [
{
"name": "Bundler-App-Name",
"value": "Warp"
}
]
},
{
"tags": [
{
"name": "Bundler-App-Name",
"value": "Redstone"
}
]
},
{
"attributes": {
"owner_address": "-OXcT1sVRSA5eGwt2k6Yuz8-3e3g9WJi5uSE99CWqsBs"
}
}
]
}
},
{
"tags": [
{
"name": "App-Name",
"valueStartsWith": "MyApp"
}
]
}
]
}
Real-World Use Cases
Personal Data Gateway
Perfect for individuals who want to process only their own data:
Unbundle Filter:
{
"and": [
{
"not": {
"or": [
{
"tags": [
{
"name": "Bundler-App-Name",
"value": "Warp"
}
]
},
{
"tags": [
{
"name": "Bundler-App-Name",
"value": "Redstone"
}
]
}
]
}
},
{
"tags": [
{
"name": "App-Name",
"valueStartsWith": "MyApp"
}
]
}
]
}
Index Filter:
{
"attributes": {
"owner_address": "YOUR_WALLET_ADDRESS"
}
}
Application-Specific Service
Ideal for building services around specific applications:
Unbundle Filter:
{
"tags": [
{
"name": "App-Name",
"valueStartsWith": "MyApp"
}
]
}
Index Filter:
{
"or": [
{
"tags": [
{
"name": "ArFS",
"value": "0.10"
}
]
},
{
"tags": [
{
"name": "ArFS",
"value": "0.11"
}
]
},
{
"tags": [
{
"name": "ArFS",
"value": "0.12"
}
]
}
]
}
Content-Type Focused Gateway
For gateways specializing in specific content types:
Unbundle Filter:
{
"tags": [
{
"name": "Content-Type",
"valueStartsWith": "image/"
}
]
}
Index Filter:
{
"and": [
{
"tags": [
{
"name": "Content-Type",
"valueStartsWith": "image/"
}
]
},
{
"attributes": {
"data_size": 100000
}
}
]
}
Performance Optimization
Worker Configuration
Understanding Default Worker Settings
The gateway uses sensible defaults that work well for most users:
# Default values (no need to set unless customizing)
# ANS104_UNBUNDLE_WORKERS=1 (default: 0, or 1 if filters are set)
# ANS104_DOWNLOAD_WORKERS=5 (default: 5)
# Only adjust if you have specific hardware requirements
# or want to optimize for your system's capabilities
When to Adjust Workers: Only modify worker counts if you have high-performance hardware and want to maximize throughput, or if you're experiencing resource constraints and need to reduce load.
Optimize Data Flushing
Balance between memory usage and database performance:
# For high-memory systems, increase threshold
DATA_ITEM_FLUSH_COUNT_THRESHOLD=2000
# For low-memory systems, decrease threshold
DATA_ITEM_FLUSH_COUNT_THRESHOLD=500
# Adjust flush interval based on data volume
MAX_FLUSH_INTERVAL_SECONDS=300
Enable Background Processing
# Enable background verification
ENABLE_BACKGROUND_DATA_VERIFICATION=true
# Enable WAL cleanup for better performance
ENABLE_DATA_DB_WAL_CLEANUP=true
Webhook Filters
There are also two filters available that are used to trigger webhooks. When a transaction is processed that matches one of the webhook filters, the gateway will send a webhook to the specified WEBHOOK_TARGET_SERVERS
urls containing the transaction data.
WEBHOOK_INDEX_FILTER=""
WEBHOOK_BLOCK_FILTER=""
The WEBHOOK_INDEX_FILTER
is used to trigger a webhook when a transaction is indexed. The WEBHOOK_BLOCK_FILTER
is used to trigger a webhook when a block is processed.
Important Notes
- All tag names and values are base64url-decoded before matching
- Owner addresses are automatically converted from owner public keys
- Empty or undefined filters default to "never match"
- Tag matching requires all specified tags to match
- Attribute matching requires all specified attributes to match
- The filter system supports nested logical operations to any depth, allowing for very precise control over what data gets processed
Best Practices
Filter Design
- Start Simple - Begin with basic filters and gradually add complexity
- Test Thoroughly - Use
FILTER_CHANGE_REPROCESS=true
when changing filters - Monitor Performance - Watch system resources during processing
- Document Changes - Keep track of filter modifications and their effects
Maintenance
- Regular Monitoring - Check gateway logs for errors and warnings
- Resource Cleanup - Periodically clean up old data and logs
- Filter Optimization - Refine filters based on actual data patterns
- Backup Configuration - Keep copies of working filter configurations
Troubleshooting
If your gateway stops processing data after changing filters, check: - Filter syntax is valid JSON - Required environment variables are set - Gateway has been restarted after changes - System has sufficient resources
Next Steps
Now that you understand gateway filtering, continue building your infrastructure:
Set Up Monitoring
Deploy Grafana to visualize your gateway's performance metrics
Add ClickHouse
Improve query performance with ClickHouse and Parquet integration
Deploy Bundler
Accept data uploads directly through your gateway
Run Compute Unit
Execute AO processes locally for maximum efficiency
How is this guide?
Automating SSL Certificate Renewal
Step-by-step guide to configure Certbot with automatic SSL certificate renewal using DNS API for AR.IO Gateway
Content Moderation
Gateway operators have the right and ability to blocklist any content or ArNS name that is deemed in violation of its content policies or is non-compliant with local regulations.