Search Indexing Guidelines
Problem
Dynamic filtering components (like the Contract Addresses Table and DVN Addresses Table) update URLs with query parameters to make filter states shareable:
/deployments/deployed-contracts?chains=ethereum,bsc&stages=mainnet
/deployments/dvn-addresses?dvns=layerzero&chains=polygon
Search indexers like Typesense treat each unique URL as a separate page, creating hundreds of duplicate content entries for the same base page content.
Solution
We implement a multi-layered approach to prevent duplicate content indexing:
1. Meta Tags (Primary Solution)
The useSearchIndexing
hook automatically adds appropriate meta tags when filters are active:
// When filters are active:
<link rel="canonical" href="/deployments/deployed-contracts" />
<meta name="robots" content="noindex, follow" />
How it works:
- Canonical URL: Points to the base page without query parameters
- Noindex: Prevents indexing of filtered pages while allowing link following
2. Robots.txt (Secondary Protection)
Additional robots.txt rules block crawlers from accessing filtered URLs:
# Block crawling of filtered URLs
Disallow: /*?*chains=*
Disallow: /*?*dvns=*
Disallow: /*?*stages=*
Disallow: /*?*page=*
3. Usage in Components
For any component with filtering, use the hook:
import {useSearchIndexing} from '../hooks/useSearchIndexing';
function MyFilterableComponent() {
const [filters, setFilters] = useState([]);
// Determine if any filters are active
const hasActiveFilters = filters.length > 0 || searchTerm || otherFilterConditions;
// Apply search indexing rules
useSearchIndexing(hasActiveFilters);
// ... rest of component
}
Benefits
- Prevents Duplicate Content: Search engines won't index multiple versions of the same page
- Maintains Shareability: URLs with filters still work for users sharing links
- Preserves SEO: Base pages retain their search ranking
- Automatic: Works without manual intervention once implemented
Implementation Details
- Canonical URLs tell search engines which version is the "master"
- Noindex prevents duplicate content while preserving link equity
- Robots.txt provides an additional layer of protection
- Cleanup ensures meta tags are removed when filters are cleared
Best Practices
- Always use the hook for any component that modifies URLs based on user interaction
- Test thoroughly to ensure meta tags are added/removed correctly
- Monitor search console for any remaining duplicate content issues
- Consider analytics when deciding which filtered pages (if any) should be indexed
Monitoring
Check Google Search Console and other search tools for:
- Duplicate content warnings
- Pages with canonical issues
- Unexpected indexed URLs with query parameters
References
This approach follows industry best practices used by:
- Major ecommerce sites (Amazon, eBay)
- Documentation platforms (Algolia, GitLab)
- Content management systems (WordPress, Drupal)
See Webflow's pagination approach and ecommerce faceted navigation best practices for more details.