Change Data Capture (CDC) Summary
Purpose: Track and propagate data changes (insert, update, delete) from a source to downstream systems.
Write-Ahead Log (WAL): In relational databases (e.g., PostgreSQL), all changes are logged before being applied. CDC reads from the WAL to capture changes.
NoSQL: In systems like DynamoDB, item-level streams capture changes in a similar fashion.
Pipeline Actions:
- Detect changes.
- Send to downstream systems.
- Examples: replicate to read replicas, sync to a search index (Elasticsearch), or forward to Kafka.
Key Considerations:
- Ordering: Ensure changes are applied in order or handle out-of-order events.
- Consistency: Choose between eventual or strong consistency.
- Schema Evolution: Manage schema changes over time.
- Performance: Balance latency vs. throughput and handle backpressure.
Resilience: Ensure fault tolerance—prepare for target system delays or failures.