Sitecore Search: Incremental Updates vs Delta Crawling
Incremental Updates & Delta Crawling in Sitecore Search
Posted on June 2, 2026 • 7Β minutes • 1318Β words
Table of contents
- β οΈ It Looked Simple. Until It Wasn't
- π The Real Problem: Crawlers Don't Know What Just Changed
- πΊοΈ Delta Crawling: Smarter Scheduled Crawls
- π‘ Incremental Updates: Near-Real-Time Indexing via API
- ποΈ The Architecture That Actually Works
- βοΈ Decision Framework: Delta Crawling vs Incremental Updates
- π The Solutioning Checklist Nobody Gives You
- π Wrapping Up
- π§ΎCredit/References
You know that moment in the planning workshop. Someone says,
Sitecore Search
Sitecore Search indexing? It’s just a crawler - should be straightforward right? Just set up the crawler - itβs out of the box.
Famous last words in almost every Sitecore Search solutioning meeting I've sat in.
Marketing Wants Immediate Visibility
The marketing team has been hoping to see the articles they are publishing to show up (sort of) immediately.
This article dives deep into Incremental Updates and Delta Crawling in Sitecore Search - two powerful but complex features that look simple on paper but reveal important surprises during real implementations.
Before diving in, I recommend reading my previous article on Content Indexing with Sitecore Search for the foundational concepts. This article picks up where that one left off, focusing on the two mechanisms that decide how your index stays in sync: Delta Crawling and Incremental Updates.
By default, a Sitecore Search web crawler does exactly what its name implies - it crawls. All of your configured URLs. Every single one. On every scheduled run.
For a small site, this is fine. But for a Newsroom, product catalog, or large enterprise content hub, a full recrawl can take hours - and run daily doesn’t match the reality of how fast content teams publish and update.
Sitecore Search gives you two levers to tackle this problem:
1. Delta Crawling (Smarter scheduled crawls)
2. Incremental Updates (API-driven, near-real-time pushes)
Historically you had to pick one (because enabling one could disable the other). Community updates suggest this is improving in newer releases for eligible sources - but it’s still worth validating the behavior in your own tenant.
This diagram shows the two common paths to keep the index fresh - sitemap-driven vs event-driven.

Delta crawling is an optimisation of the standard web crawler. Instead of re-crawling every URL on every run, it only re-crawls URLs that have changed since the last run. It does this by reading the lastmod field from your sitemap.
When delta crawling works well
- You use a sitemap or sitemap_index trigger
- Your sitemap includes a reliable lastmod
- Your crawler depth is set to 0
Delta crawling flow
This flow makes it clear why lastmod quality is so important: the crawler decides to crawl or skip purely on change detection.

π Surprise #1: lastmod is often a Date, not a DateTime
If your sitemap uses date-only lastmod (for example 2026-05-26), multiple edits on the same day can look identical to the crawler. The end result is a **quiet mismatch **between what editors expect and what the crawler can detect.
Practical mitigation
If your site updates multiple times per day, consider generating lastmod as a full ISO datetime (where supported in your sitemap generation logic).
Incremental Updates take a different approach: instead of the crawler discovering changes later, you push the updated document to the index through the Sitecore Search Ingestion API .
Enabling Incremental Updates (high-level)
- In Sitecore Search, open your source under Sources
- Edit Incremental Updates
- Turn on ENABLE INCREMENTAL UPDATES and publish the source
Incremental update sequence
This sequence diagram shows the end-to-end publish-to-index path.

π§± Surprise #2: Incremental Updates vs Delta Crawling behavior
Teams often assume they can just turn both on. Historically, enabling Incremental Updates could disable delta crawling so always validate this behavior in your environment.
𧨠Surprise #3: Full recrawls can overwrite API-pushed changes
A scheduled (or manually triggered) full crawl can overwrite changes made via the Ingestion API if those changes are not represented in the original content source. This is why many teams keep a scheduled full crawl as a safety net - and treat the CMS as the source of truth.
A reliable pattern for SitecoreAI (XM Cloud) + Sitecore Search looks like this:

β Surprise #4: The GraphQL assumption is an oversimplification for component-heavy pages
In practice, the web crawler is powerful because it indexes rendered page output (including components inserted into placeholders) without needing you to model every component field. A middleware-driven incremental update often needs explicit knowledge of which fields to fetch - and that can become fragile as the component library grows.
One alternative (worth prototyping) is using an ingestion approach that creates/updates a document based on a URL using the crawler’s extractor, so you keep the rendered output advantage while still doing targeted updates.
π¦ Surprise #5: Protect against big publish events
If a full-site publish triggers thousands of events, your middleware can flood the Ingestion API (and hit request caps).
Two common mitigations are:
- Add a Reindex boolean and only trigger incremental updates when true
- Use stateful middleware that pauses incremental updates during bulk publish
This state diagram shows a simple mental model for pausing incremental updates during a bulk publish event:

How the system decides what to sync and when
Most of the time, the system runs quietly in the background. A content editor hits publish, the webhook fires, and Sitecore Search is updated within seconds. No manual steps, no delays - it just works.
But what happens when someone triggers a full site republish? Suddenly hundreds of items are flying through the pipeline at once. If the system tried to process all of them in real-time, things would quickly get messy - rate limits, conflicts, a stressed Azure Function.
So instead, it does something smart. It detects the large publish, pauses the real-time updates, and steps aside. The scheduled crawl handles the heavy lifting once things settle down.
And when the publish is done? The system picks right back up where it left off. If the publish complete signal never arrives for some reason, a built-in timeout kicks in and auto-recovers - so nothing ever gets permanently stuck.
The result is a search index that stays accurate without you having to babysit it - fast when it can be, patient when it needs to be.
| Factor | Delta Crawling | Incremental Updates |
|---|---|---|
| Trigger | Scheduled crawler run | Webhook + middleware push |
| Granularity | Changed URLs (via lastmod) | Individual documents/fields |
| API integration required? | No | Yes (Ingestion API) |
| Update speed | Next crawl run | Near real-time (queued) |
| Multiple edits in one day | May miss (date-only lastmod) | Captures updates |
| Component-heavy pages | Naturally handled (rendered output) | Needs field mapping or URL-based approach |
| Custom dev effort | Minimal | Medium to High |
| Risk of API limits | No | Yes (bulk publish protection needed) |
| Best for | Efficient scheduled re-indexing | Fast updates on high-value content |
π³ Quick decision tree
A lightweight way to align expectations before you estimate custom work:

π¬ How quickly must content appear in search after publishing? (minutes, hours, next day)
π¬ How many times per day is the same item typically updated?
π¬ Do your pages rely on many components / datasources / render-time composition?
π¬ Does your sitemap expose lastmod and is it date-only or datetime?
π¬ What does a bulk publish look like (volume and frequency)?
π¬ Do you need a Reindex flag, stateful middleware, or both?
π¬ What's the safety net schedule for a full crawl?
π¬ How will you monitor failures in webhook => middleware => ingestion?
π¬ Have you budgeted custom implementation work separately from "crawler setup"?
Sitecore Search indexing isn’t hard - but it’s not a one-click configuration either. Delta crawling and incremental updates solve the same problem from different angles. The teams that get this right are the ones that ask the hard questions during solutioning, not during UAT.



