Developer Playground
About Kafka Basic
Updated: March 31, 2025
Topics
- A category or feed name to which records are published
- Identified by unique names within a Kafka cluster
- Store messages in various formats (
JSON,Avro,Protobuf,text,binary,custom) - Split into partitions for distributed data scaling
- Configured with replication factor for fault tolerance
- Support configurable retention policies (time/size based)
- Immutable append-only logs - once written, cannot be modified
- Names are case-sensitive (alphanumeric, dots, underscores, hyphens)
- Internal topics:
__consumer_offsetsand__transaction_state
Partitions &Offsets
- Topics have one or multiple partitions for parallel processing
- Each partition is an ordered, immutable sequence of records
- Messages in a partition are strictly ordered with sequential offsets
- Offsets are partition-specific identifiers that are immutable
- Each partition starts with offset 0
- Default retention: 7 days (configurable by time/size)
- Oldest messages are removed when retention limits are reached
- Partitions are distributed across brokers for load balancing
- Each has a leader broker and zero or more follower brokers
- Partition count can be increased but not decreased after creation
Producer
- Write (publish) data to Kafka topics
- Can specify partition or let Kafka handle assignment
- Message components: key, value, headers, timestamp
- Compression options: none (default), gzip, snappy, lz4, zstd
- Timestamp options: system time (default) or custom
- Support for retries, idempotence, and exactly-once semantics (since 2.8)
Partitioning strategies:
- Null key: round-robin distribution
- Non-null key: consistent hashing (murmur2)
- Custom partitioning possible
Acknowledgment modes (acks):
- acks=0: No acknowledgment (fire and forget)
- acks=1: Leader acknowledgment only (default)
- acks=all/-1: Full acknowledgment from leader and all in-sync replicas
Consumers
- Pull (fetch) data from Kafka topics
- Read messages in exact write order within each partition
- Use deserializers for various data formats
- Maintain position by tracking last consumed offset
- Offset reset policies: earliest, latest, none
- Configurable fetch settings for throughput vs. latency optimization
Consumer Groups
- Organized for parallel processing
- Each consumer assigned exclusive partitions within a group
- Dynamic rebalancing when consumers join/leave
- Inactive consumers if more consumers than partitions
- Identified by unique group.id
- Offsets committed to __consumer_offsets topic
- Managed by a group coordinator broker
- Partition assignment strategies: Range, RoundRobin, Sticky, CooperativeSticky
Delivery semantics for consumers
At least once (default):
- Commit after processing
- May cause duplicates if failure occurs
- Requires idempotent consumers
At most once:
- Commit on receive, before processing
- No reprocessing on failure, potential data loss
Exactly once:
- Via Kafka Transactions API
- Requires idempotent producers and transactional consumers
- Primarily for Kafka-to-Kafka workflows
Kafka brokers
- Distributed system of multiple servers (3-100+)
- Each identified by integer ID
- Connect via bootstrap servers
- Manage partitions, handle requests, manage replication
- Automatic leadership transfer on failure
- Controller broker manages administrative operations
Topic replication factor
- Replication factor = number of copies per partition
- Recommended: factor of 3 for production
- Must be ≤ number of brokers
- Set at topic level, can differ between topics
- With factor N, tolerate N-1 broker failures
Concept of Leader for a Partition
- One leader per partition handles all reads/writes
- Followers passively replicate from leader
- In-Sync Replicas (ISR) = leader + caught-up followers
- With 3 partitions, RF=3, 3 brokers: each partition has ISR=3
- Automatic leadership transfer on failure
- Controller manages elections
Kafka Topic durability
- Replication factor N tolerates N-1 broker failures
- Enhanced by min.insync.replicas setting
- Strongest guarantees: RF=3, min.insync.replicas=2, acks=all
- Trade-off: higher durability vs. latency/throughput
Zookeeper
- Managed metadata and broker coordination (historically)
- Handled broker registration, configurations, elections
- Stored consumer offsets until 0.10
- Required until Kafka 2.7
- KRaft mode preview in 2.8, official in 3.0
- Production-ready in 3.3.0
- Removed completely in 4.0
- Typically used 3-5 nodes (7 for large clusters)
- Leader-follower ensemble with quorum consensus
Related Articles
Controlling Processing Rate in Kafka Consumers
Learn how to control message processing rates in Kafka consumers for optimized throughput.
Read more
Data Encoding Tools
Base64 Encoding for Message Serialization
Useful for encoding binary data in Kafka messages. Try our Base64 encoding tool.
Try the tool