Sitecore Search: Incremental Updates vs Delta Crawling

Incremental Updates & Delta Crawling in Sitecore Search

Posted on June 2, 2026 • 7 minutes • 1318 words

Table of contents

⚠️ It Looked Simple. Until It Wasn't
🔄 The Real Problem: Crawlers Don't Know What Just Changed
🗺️ Delta Crawling: Smarter Scheduled Crawls
📡 Incremental Updates: Near-Real-Time Indexing via API
🏛️ The Architecture That Actually Works
- ❓ Surprise #4: The GraphQL assumption is an oversimplification for component-heavy pages
- 🚦 Surprise #5: Protect against big publish events
  - 🏰 Full Publish Protection
⚖️ Decision Framework: Delta Crawling vs Incremental Updates
- 🌳 Quick decision tree
📋 The Solutioning Checklist Nobody Gives You
🏁 Wrapping Up
🧾Credit/References

⚠️ It Looked Simple. Until It Wasn't

You know that moment in the planning workshop. Someone says,

Sitecore Search

Sitecore Search indexing? It’s just a crawler - should be straightforward right? Just set up the crawler - it’s out of the box.

Famous last words in almost every Sitecore Search solutioning meeting I've sat in.

Then reality hits.

Marketing Wants Immediate Visibility

The marketing team has been hoping to see the articles they are publishing to show up (sort of) immediately.

Suddenly you're debugging why updates take hours, why lastmod isn't behaving as expected, and why enabling one feature quietly disables another.

This article dives deep into Incremental Updates and Delta Crawling in Sitecore Search - two powerful but complex features that look simple on paper but reveal important surprises during real implementations.

Before diving in, I recommend reading my previous article on Content Indexing with Sitecore Search for the foundational concepts. This article picks up where that one left off, focusing on the two mechanisms that decide how your index stays in sync: Delta Crawling and Incremental Updates.

🔄 The Real Problem: Crawlers Don't Know What Just Changed

By default, a Sitecore Search web crawler does exactly what its name implies - it crawls. All of your configured URLs. Every single one. On every scheduled run.

For a small site, this is fine. But for a Newsroom, product catalog, or large enterprise content hub, a full recrawl can take hours - and run daily doesn’t match the reality of how fast content teams publish and update.

Sitecore Search gives you two levers to tackle this problem:

1. Delta Crawling (Smarter scheduled crawls)

2. Incremental Updates (API-driven, near-real-time pushes)

Historically you had to pick one (because enabling one could disable the other). Community updates suggest this is improving in newer releases for eligible sources - but it’s still worth validating the behavior in your own tenant.

This diagram shows the two common paths to keep the index fresh - sitemap-driven vs event-driven.

🗺️ Delta Crawling: Smarter Scheduled Crawls

Delta crawling is an optimisation of the standard web crawler. Instead of re-crawling every URL on every run, it only re-crawls URLs that have changed since the last run. It does this by reading the lastmod field from your sitemap.

When delta crawling works well

You use a sitemap or sitemap_index trigger
Your sitemap includes a reliable lastmod
Your crawler depth is set to 0

Delta crawling flow

This flow makes it clear why lastmod quality is so important: the crawler decides to crawl or skip purely on change detection.

🔔 Surprise #1: lastmod is often a Date, not a DateTime

If your sitemap uses date-only lastmod (for example 2026-05-26), multiple edits on the same day can look identical to the crawler. The end result is a **quiet mismatch **between what editors expect and what the crawler can detect.

Practical mitigation

If your site updates multiple times per day, consider generating lastmod as a full ISO datetime (where supported in your sitemap generation logic).

📡 Incremental Updates: Near-Real-Time Indexing via API

Incremental Updates take a different approach: instead of the crawler discovering changes later, you push the updated document to the index through the Sitecore Search Ingestion API .

Enabling Incremental Updates (high-level)

In Sitecore Search, open your source under Sources
Edit Incremental Updates
Turn on ENABLE INCREMENTAL UPDATES and publish the source

Incremental update sequence

This sequence diagram shows the end-to-end publish-to-index path.

Sitecore Search web crawler optimization

🧱 Surprise #2: Incremental Updates vs Delta Crawling behavior

Teams often assume they can just turn both on. Historically, enabling Incremental Updates could disable delta crawling so always validate this behavior in your environment.

🧨 Surprise #3: Full recrawls can overwrite API-pushed changes

A scheduled (or manually triggered) full crawl can overwrite changes made via the Ingestion API if those changes are not represented in the original content source. This is why many teams keep a scheduled full crawl as a safety net - and treat the CMS as the source of truth.

🏛️ The Architecture That Actually Works

A reliable pattern for SitecoreAI (XM Cloud) + Sitecore Search looks like this:

Sitecore Search: Incremental Updates vs Delta Crawling

❓ Surprise #4: The GraphQL assumption is an oversimplification for component-heavy pages

In practice, the web crawler is powerful because it indexes rendered page output (including components inserted into placeholders) without needing you to model every component field. A middleware-driven incremental update often needs explicit knowledge of which fields to fetch - and that can become fragile as the component library grows.

One alternative (worth prototyping) is using an ingestion approach that creates/updates a document based on a URL using the crawler’s extractor, so you keep the rendered output advantage while still doing targeted updates.

🚦 Surprise #5: Protect against big publish events

If a full-site publish triggers thousands of events, your middleware can flood the Ingestion API (and hit request caps).

Two common mitigations are:

Add a Reindex boolean and only trigger incremental updates when true
Use stateful middleware that pauses incremental updates during bulk publish

🏰 Full Publish Protection

This state diagram shows a simple mental model for pausing incremental updates during a bulk publish event:

Sitecore Search: Pause Incremental Updates

How the system decides what to sync and when

Most of the time, the system runs quietly in the background. A content editor hits publish, the webhook fires, and Sitecore Search is updated within seconds. No manual steps, no delays - it just works.

But what happens when someone triggers a full site republish? Suddenly hundreds of items are flying through the pipeline at once. If the system tried to process all of them in real-time, things would quickly get messy - rate limits, conflicts, a stressed Azure Function.

So instead, it does something smart. It detects the large publish, pauses the real-time updates, and steps aside. The scheduled crawl handles the heavy lifting once things settle down.

And when the publish is done? The system picks right back up where it left off. If the publish complete signal never arrives for some reason, a built-in timeout kicks in and auto-recovers - so nothing ever gets permanently stuck.

The result is a search index that stays accurate without you having to babysit it - fast when it can be, patient when it needs to be.

⚖️ Decision Framework: Delta Crawling vs Incremental Updates

Factor	Delta Crawling	Incremental Updates
Trigger	Scheduled crawler run	Webhook + middleware push
Granularity	Changed URLs (via lastmod)	Individual documents/fields
API integration required?	No	Yes (Ingestion API)
Update speed	Next crawl run	Near real-time (queued)
Multiple edits in one day	May miss (date-only lastmod)	Captures updates
Component-heavy pages	Naturally handled (rendered output)	Needs field mapping or URL-based approach
Custom dev effort	Minimal	Medium to High
Risk of API limits	No	Yes (bulk publish protection needed)
Best for	Efficient scheduled re-indexing	Fast updates on high-value content

🌳 Quick decision tree

A lightweight way to align expectations before you estimate custom work:

Incremental updates and delta crawling in Sitecore Search - surprises, architecture, and solutioning tips.

📋 The Solutioning Checklist Nobody Gives You

💬 How quickly must content appear in search after publishing? (minutes, hours, next day)

💬 How many times per day is the same item typically updated?

💬 Do your pages rely on many components / datasources / render-time composition?

💬 Does your sitemap expose lastmod and is it date-only or datetime?

💬 What does a bulk publish look like (volume and frequency)?

💬 Do you need a Reindex flag, stateful middleware, or both?

💬 What's the safety net schedule for a full crawl?

💬 How will you monitor failures in webhook => middleware => ingestion?

💬 Have you budgeted custom implementation work separately from "crawler setup"?

🏁 Wrapping Up

Sitecore Search indexing isn’t hard - but it’s not a one-click configuration either. Delta crawling and incremental updates solve the same problem from different angles. The teams that get this right are the ones that ask the hard questions during solutioning, not during UAT.

🧾Credit/References

Content Indexing with Sitecore Search	Enable incremental updates for a crawler	Web crawler optimizations
Incrementally updating Search	MCP vs Copilot vs GenAI Article	Sitecpre Ingestion API (1.0.0)
Build Custom Sitecore MCP Tools	Sitecore Dataverse Integration	Sitecore MCP server
Sitecore System Fields	Sitecore Wildcard Pages	SitecoreAI Performance / Sitecore XM Cloud Performance

View All

Sitecore Search: Incremental Updates vs Delta Crawling

Posted on June 2, 2026 • 7 minutes • 1318 words

⚠️ It Looked Simple. Until It Wasn't

🔄 The Real Problem: Crawlers Don't Know What Just Changed

🗺️ Delta Crawling: Smarter Scheduled Crawls

When delta crawling works well

Delta crawling flow

🔔 Surprise #1: lastmod is often a Date, not a DateTime

📡 Incremental Updates: Near-Real-Time Indexing via API

Enabling Incremental Updates (high-level)

Incremental update sequence

🧱 Surprise #2: Incremental Updates vs Delta Crawling behavior

🧨 Surprise #3: Full recrawls can overwrite API-pushed changes

🏛️ The Architecture That Actually Works

❓ Surprise #4: The GraphQL assumption is an oversimplification for component-heavy pages

🚦 Surprise #5: Protect against big publish events

🏰 Full Publish Protection

⚖️ Decision Framework: Delta Crawling vs Incremental Updates

🌳 Quick decision tree

📋 The Solutioning Checklist Nobody Gives You

🏁 Wrapping Up

🧾Credit/References

Amit Kumar

Quick Links

Connect