Developer Playground

About Kafka Basic

Updated: March 31, 2025

Topics

  • A category or feed name to which records are published
  • Identified by unique names within a Kafka cluster
  • Store messages in various formats (JSON, Avro, Protobuf, text, binary, custom)
  • Split into partitions for distributed data scaling
  • Configured with replication factor for fault tolerance
  • Support configurable retention policies (time/size based)
  • Immutable append-only logs - once written, cannot be modified
  • Names are case-sensitive (alphanumeric, dots, underscores, hyphens)
  • Internal topics: __consumer_offsets and __transaction_state
Kafka Topics and Partitions Topic A Partition 0 0 1 2 3 Partition 1 0 1 2 Topic B Partition 0 0 1 2 3 4 Partition 1 0 1 Message with offset

Partitions &Offsets

  • Topics have one or multiple partitions for parallel processing
  • Each partition is an ordered, immutable sequence of records
  • Messages in a partition are strictly ordered with sequential offsets
  • Offsets are partition-specific identifiers that are immutable
  • Each partition starts with offset 0
  • Default retention: 7 days (configurable by time/size)
  • Oldest messages are removed when retention limits are reached
  • Partitions are distributed across brokers for load balancing
  • Each has a leader broker and zero or more follower brokers
  • Partition count can be increased but not decreased after creation

Producer

  • Write (publish) data to Kafka topics
  • Can specify partition or let Kafka handle assignment
  • Message components: key, value, headers, timestamp
  • Compression options: none (default), gzip, snappy, lz4, zstd
  • Timestamp options: system time (default) or custom
  • Support for retries, idempotence, and exactly-once semantics (since 2.8)

Partitioning strategies:

  • Null key: round-robin distribution
  • Non-null key: consistent hashing (murmur2)
  • Custom partitioning possible

Acknowledgment modes (acks):

  • acks=0: No acknowledgment (fire and forget)
  • acks=1: Leader acknowledgment only (default)
  • acks=all/-1: Full acknowledgment from leader and all in-sync replicas
Kafka Producers and Consumers Producer 1 Producer 2 Kafka Cluster Topic Partition 0 Partition 1 Partition 2 Consumer Group Consumer 1 Consumer 2 Consumer 3 Key-based routing Acks, compression Ordered messages in each partition Immutable offsets One partition per consumer within group

Consumers

  • Pull (fetch) data from Kafka topics
  • Read messages in exact write order within each partition
  • Use deserializers for various data formats
  • Maintain position by tracking last consumed offset
  • Offset reset policies: earliest, latest, none
  • Configurable fetch settings for throughput vs. latency optimization

Consumer Groups

  • Organized for parallel processing
  • Each consumer assigned exclusive partitions within a group
  • Dynamic rebalancing when consumers join/leave
  • Inactive consumers if more consumers than partitions
  • Identified by unique group.id
  • Offsets committed to __consumer_offsets topic
  • Managed by a group coordinator broker
  • Partition assignment strategies: Range, RoundRobin, Sticky, CooperativeSticky
Kafka Consumer Groups and Delivery Semantics Kafka Topic Partition 0 Partition 1 Partition 2 Partition 3 Group A Consumer A1 Consumer A2 Consumer A3 Consumer A4 Group B Consumer B1 Consumer B2 Delivery Semantics • At least once (default): Commit after processing, possible duplicates • At most once: Commit on receive, possible data loss • Exactly once: Transactions API, no duplicates, no loss Each consumer handles 2 partitions Topic with 4 partitions 1 partition per consumer within the group Offsets stored in __consumer_offsets topic Group A connections Group B connections

Delivery semantics for consumers

At least once (default):

  • Commit after processing
  • May cause duplicates if failure occurs
  • Requires idempotent consumers

At most once:

  • Commit on receive, before processing
  • No reprocessing on failure, potential data loss

Exactly once:

  • Via Kafka Transactions API
  • Requires idempotent producers and transactional consumers
  • Primarily for Kafka-to-Kafka workflows

Kafka brokers

  • Distributed system of multiple servers (3-100+)
  • Each identified by integer ID
  • Connect via bootstrap servers
  • Manage partitions, handle requests, manage replication
  • Automatic leadership transfer on failure
  • Controller broker manages administrative operations
Kafka Brokers and Replication Broker 1 (id: 101) Topic A - P0 (Leader) Topic A - P1 (Follower) Topic B - P1 (Follower) Broker 2 (id: 102) Topic A - P0 (Follower) Topic A - P1 (Leader) Topic B - P0 (Follower) Broker 3 (id: 103) Topic A - P0 (Follower) Topic A - P1 (Follower) Topic B - P0 (Leader) Replication Information • Replication Factor: 3 (each partition has 3 copies across brokers) • For each partition, one broker is the leader, others are followers (ISR) • With RF=3, cluster can tolerate 2 broker failures without data loss Leader Partition Follower Partition Controller Broker Manages administrative tasks and leadership elections

Topic replication factor

  • Replication factor = number of copies per partition
  • Recommended: factor of 3 for production
  • Must be ≤ number of brokers
  • Set at topic level, can differ between topics
  • With factor N, tolerate N-1 broker failures

Concept of Leader for a Partition

  • One leader per partition handles all reads/writes
  • Followers passively replicate from leader
  • In-Sync Replicas (ISR) = leader + caught-up followers
  • With 3 partitions, RF=3, 3 brokers: each partition has ISR=3
  • Automatic leadership transfer on failure
  • Controller manages elections

Kafka Topic durability

  • Replication factor N tolerates N-1 broker failures
  • Enhanced by min.insync.replicas setting
  • Strongest guarantees: RF=3, min.insync.replicas=2, acks=all
  • Trade-off: higher durability vs. latency/throughput
Kafka: Zookeeper vs KRaft Zookeeper-based Architecture (Before Kafka 4.0) Zookeeper Ensemble ZK1 ZK2 ZK3 Kafka Brokers B1 B2 B3 KRaft Architecture (Kafka 4.0+) Kafka Brokers with KRaft KRaft Controller Quorum KR1 KR2 KR3 B1 B2 B3 Kafka Version Timeline v2.8 KRaft Preview v3.0 KRaft Supported v3.3 KRaft Production-Ready v4.0 Zookeeper Removed Zookeeper: Manages metadata, broker coordination, leader election KRaft: Self-managed consensus, simplified architecture

Zookeeper

  • Managed metadata and broker coordination (historically)
  • Handled broker registration, configurations, elections
  • Stored consumer offsets until 0.10
  • Required until Kafka 2.7
  • KRaft mode preview in 2.8, official in 3.0
  • Production-ready in 3.3.0
  • Removed completely in 4.0
  • Typically used 3-5 nodes (7 for large clusters)
  • Leader-follower ensemble with quorum consensus

Related Articles

Kafka Consumer Rate Control
Controlling Processing Rate in Kafka Consumers

Learn how to control message processing rates in Kafka consumers for optimized throughput.

Read more
Data Encoding Tools
Base64 Encoding for Message Serialization

Useful for encoding binary data in Kafka messages. Try our Base64 encoding tool.

Try the tool