Kafka Basics: Topics, Partitions, and Core Concepts

Topics

A category or feed name to which records are published
Identified by unique names within a Kafka cluster
Store messages in various formats (JSON, Avro, Protobuf, text, binary, custom)
Split into partitions for distributed data scaling
Configured with replication factor for fault tolerance
Support configurable retention policies (time/size based)
Immutable append-only logs - once written, cannot be modified
Names are case-sensitive (alphanumeric, dots, underscores, hyphens)
Internal topics: __consumer_offsets and __transaction_state

Partitions &Offsets

Topics have one or multiple partitions for parallel processing
Each partition is an ordered, immutable sequence of records
Messages in a partition are strictly ordered with sequential offsets
Offsets are partition-specific identifiers that are immutable
Each partition starts with offset 0
Default retention: 7 days (configurable by time/size)
Oldest messages are removed when retention limits are reached
Partitions are distributed across brokers for load balancing
Each has a leader broker and zero or more follower brokers
Partition count can be increased but not decreased after creation

Producer

Write (publish) data to Kafka topics
Can specify partition or let Kafka handle assignment
Message components: key, value, headers, timestamp
Compression options: none (default), gzip, snappy, lz4, zstd
Timestamp options: system time (default) or custom
Support for retries, idempotence, and exactly-once semantics (since 2.8)

Partitioning strategies:

Null key: round-robin distribution
Non-null key: consistent hashing (murmur2)
Custom partitioning possible

Acknowledgment modes (acks):

acks=0: No acknowledgment (fire and forget)
acks=1: Leader acknowledgment only (default)
acks=all/-1: Full acknowledgment from leader and all in-sync replicas

Consumers

Pull (fetch) data from Kafka topics
Read messages in exact write order within each partition
Use deserializers for various data formats
Maintain position by tracking last consumed offset
Offset reset policies: earliest, latest, none
Configurable fetch settings for throughput vs. latency optimization

Consumer Groups

Organized for parallel processing
Each consumer assigned exclusive partitions within a group
Dynamic rebalancing when consumers join/leave
Inactive consumers if more consumers than partitions
Identified by unique group.id
Offsets committed to __consumer_offsets topic
Managed by a group coordinator broker
Partition assignment strategies: Range, RoundRobin, Sticky, CooperativeSticky

Delivery semantics for consumers

At least once (default):

Commit after processing
May cause duplicates if failure occurs
Requires idempotent consumers

At most once:

Commit on receive, before processing
No reprocessing on failure, potential data loss

Exactly once:

Via Kafka Transactions API
Requires idempotent producers and transactional consumers
Primarily for Kafka-to-Kafka workflows

Kafka brokers

Distributed system of multiple servers (3-100+)
Each identified by integer ID
Connect via bootstrap servers
Manage partitions, handle requests, manage replication
Automatic leadership transfer on failure
Controller broker manages administrative operations

Topic replication factor

Replication factor = number of copies per partition
Recommended: factor of 3 for production
Must be ≤ number of brokers
Set at topic level, can differ between topics
With factor N, tolerate N-1 broker failures

Concept of Leader for a Partition

One leader per partition handles all reads/writes
Followers passively replicate from leader
In-Sync Replicas (ISR) = leader + caught-up followers
With 3 partitions, RF=3, 3 brokers: each partition has ISR=3
Automatic leadership transfer on failure
Controller manages elections

Kafka Topic durability

Replication factor N tolerates N-1 broker failures
Enhanced by min.insync.replicas setting
Strongest guarantees: RF=3, min.insync.replicas=2, acks=all
Trade-off: higher durability vs. latency/throughput

Zookeeper

Managed metadata and broker coordination (historically)
Handled broker registration, configurations, elections
Stored consumer offsets until 0.10
Required until Kafka 2.7
KRaft mode preview in 2.8, official in 3.0
Production-ready in 3.3.0
Removed completely in 4.0
Typically used 3-5 nodes (7 for large clusters)
Leader-follower ensemble with quorum consensus

Controlling Processing Rate in Kafka Consumers

Learn how to control message processing rates in Kafka consumers for optimized throughput.

Data Encoding Tools

Base64 Encoding for Message Serialization

Useful for encoding binary data in Kafka messages. Try our Base64 encoding tool.

Try the tool

About Kafka Basic

Topics

Partitions &Offsets

Producer

Partitioning strategies:

Acknowledgment modes (acks):

Consumers

Consumer Groups

Delivery semantics for consumers

At least once (default):

At most once:

Exactly once:

Kafka brokers

Topic replication factor

Concept of Leader for a Partition

Kafka Topic durability

Zookeeper

Related Articles

Controlling Processing Rate in Kafka Consumers

Base64 Encoding for Message Serialization