Defining the Terms
Real-time Integration
Data moves between systems immediately (or within seconds) of an event occurring. When a customer places an order on your website, your warehouse system knows about it within moments.
True real-time typically uses webhooks or event streaming - the source system pushes data as events happen rather than waiting to be asked.
Near Real-time
Data syncs frequently - every few minutes - but not instantaneously. Often implemented by polling (checking for changes on a schedule) rather than push-based events.
Batch Processing
Data accumulates and moves in scheduled batches - hourly, nightly, or weekly. All changes since the last batch are processed together.
When Real-time Makes Sense
Real-time Is Justified When
- Users expect immediate feedback: Customer-facing systems where delays are visible and frustrating
- Data has a short shelf life: Stock levels, pricing changes, availability status
- Downstream processes depend on it: Warehouse pick operations waiting for orders
- Regulatory or contractual requirements: Some industries require immediate reporting
- Competitive advantage: Faster response than competitors matters
Real-time Examples
E-commerce inventory: When stock is low, real-time sync prevents overselling. A customer shouldn't be able to buy the last unit while another customer's order is in a batch queue.
Fraud detection: Checking transactions against fraud rules must happen before the transaction completes - batching is too late.
Customer support context: When a customer calls, the agent needs to see their recent orders immediately, not from last night's batch.
When Batch Is Better
Batch Processing Works Well When
- Data doesn't change value with time: Historical records, analytics, reporting
- Processing is resource-intensive: Complex transformations, aggregations, validations
- Systems aren't always available: Legacy systems with maintenance windows
- Errors need human review: Data quality issues require investigation before committing
- Cost is a constraint: Batch is usually cheaper than real-time infrastructure
Batch Examples
Payroll processing: Salary calculations need complete data. Processing in real-time would mean incomplete calculations as timesheet entries arrive throughout the period.
Data warehouse loading: Analytics queries run on yesterday's complete data, not constantly changing current data.
Month-end reporting: Financial close processes need all transactions finalised before processing - real-time would create moving targets.
Comparison
| Factor | Real-time | Batch |
|---|---|---|
| Data freshness | Seconds/minutes | Hours/days |
| Complexity | Higher - event handling, error recovery | Lower - straightforward ETL |
| Infrastructure cost | Higher - always-on, scalable | Lower - runs periodically |
| Error handling | Must handle immediately | Can review before retry |
| System coupling | Tighter (with sync calls) | Looser (files/staging) |
| Testing difficulty | Higher - timing/sequencing issues | Lower - deterministic |
Hybrid Approaches
Most real-world integrations use a mix. You might process orders in real-time but sync product catalog changes in nightly batches. Some patterns that combine approaches:
Event-Triggered Batches
A real-time event starts a batch process. An "end of day" event triggers nightly processing. A "file received" event starts a batch import.
Micro-batching
Very frequent small batches - every 5 minutes - provide near real-time freshness with batch-style processing. Simpler than full event streaming, fresher than traditional batches.
Real-time for Critical, Batch for Rest
Prioritise real-time for the 20% of data that matters most. Order status updates might be real-time while customer preference changes sync overnight.
Making the Decision
Ask the Right Questions
- What breaks if data is 1 hour old? 1 day old? If nothing critical breaks, batch might be fine.
- Who is impacted by delays? Customers notice more than back-office staff.
- What's the cost of complexity? Real-time requires more sophisticated error handling, monitoring, and recovery.
- What are the source system's capabilities? Can it push events, or only respond to polling?
- What's the data volume pattern? Steady flow suits real-time. Spikes might overwhelm it.
Principle: Start with batch unless there's a clear business reason for real-time. It's easier to add real-time capabilities later than to simplify an overly complex real-time architecture.
Summary
Real-time integration sounds impressive, but batch processing often provides the best balance of simplicity, cost, and reliability. The right choice depends on your specific requirements - how fresh data needs to be, what systems can support, and what complexity your team can manage.
Don't default to real-time because it seems modern. Default to the simplest approach that meets your actual requirements.
Key takeaways
- Real-time integration is essential when users or systems need immediate data.
- Batch processing is better for large data volumes, cost efficiency, and reporting.
- Most businesses need a hybrid approach: real-time for critical flows, batch for bulk.
- Start with the user impact: who needs what data, and how quickly?