Skip to main content
Version: Endpoint V2

Search Indexing Guidelines

Problem

Dynamic filtering components (like the Contract Addresses Table and DVN Addresses Table) update URLs with query parameters to make filter states shareable:

/deployments/deployed-contracts?chains=ethereum,bsc&stages=mainnet
/deployments/dvn-addresses?dvns=layerzero&chains=polygon

Search indexers like Typesense treat each unique URL as a separate page, creating hundreds of duplicate content entries for the same base page content.

Solution

We implement a multi-layered approach to prevent duplicate content indexing:

1. Meta Tags (Primary Solution)

The useSearchIndexing hook automatically adds appropriate meta tags when filters are active:

// When filters are active:
<link rel="canonical" href="/deployments/deployed-contracts" />
<meta name="robots" content="noindex, follow" />

How it works:

  • Canonical URL: Points to the base page without query parameters
  • Noindex: Prevents indexing of filtered pages while allowing link following

2. Robots.txt (Secondary Protection)

Additional robots.txt rules block crawlers from accessing filtered URLs:

# Block crawling of filtered URLs
Disallow: /*?*chains=*
Disallow: /*?*dvns=*
Disallow: /*?*stages=*
Disallow: /*?*page=*

3. Usage in Components

For any component with filtering, use the hook:

import {useSearchIndexing} from '../hooks/useSearchIndexing';

function MyFilterableComponent() {
const [filters, setFilters] = useState([]);

// Determine if any filters are active
const hasActiveFilters = filters.length > 0 || searchTerm || otherFilterConditions;

// Apply search indexing rules
useSearchIndexing(hasActiveFilters);

// ... rest of component
}

Benefits

  1. Prevents Duplicate Content: Search engines won't index multiple versions of the same page
  2. Maintains Shareability: URLs with filters still work for users sharing links
  3. Preserves SEO: Base pages retain their search ranking
  4. Automatic: Works without manual intervention once implemented

Implementation Details

  • Canonical URLs tell search engines which version is the "master"
  • Noindex prevents duplicate content while preserving link equity
  • Robots.txt provides an additional layer of protection
  • Cleanup ensures meta tags are removed when filters are cleared

Best Practices

  1. Always use the hook for any component that modifies URLs based on user interaction
  2. Test thoroughly to ensure meta tags are added/removed correctly
  3. Monitor search console for any remaining duplicate content issues
  4. Consider analytics when deciding which filtered pages (if any) should be indexed

Monitoring

Check Google Search Console and other search tools for:

  • Duplicate content warnings
  • Pages with canonical issues
  • Unexpected indexed URLs with query parameters

References

This approach follows industry best practices used by:

  • Major ecommerce sites (Amazon, eBay)
  • Documentation platforms (Algolia, GitLab)
  • Content management systems (WordPress, Drupal)

See Webflow's pagination approach and ecommerce faceted navigation best practices for more details.