Can Kafka topic partitions be increased or decreased in prod?


Card image cap

Kafka

Credit: Image Source

If you have created the Kafka topic according to the current usage, in the future, You may need to decrease or increase the Kafka topic partition according to the load.

You can increase the partition count over time if the messages are distributed among the partitions using the round-robin method. 
But be careful when messages produced in the Kafka topic contain the keys. When Kafka publishes a keyed message, the message is deterministically mapped to a partition based on the hash of the key. This provides a guarantee that messages with the same key are always routed to the same partition, which can be important for some applications. Remember that messages within a partition are always delivered in order to the consumer. This delivery guarantee may no longer hold if the number of partitions changes. 

You can increase the Kafka topic partition using the below command.


bin/kafka-topics.sh --bootstrap-server  --topic  --alter --partitions 


It is not advisable to decrease the partition count of the Kafka topic in the prod because it causes data loss. If you still want to decrease the partition count, then you can use the below method.

Method-1:
Create a new Kafka topic with less partition, copy the data from the old topic to the new one, and start using the new one. You can use Confluent Kafka Replicator 

Method-2:
1. Stop the producer and wait till the consumer consumes all the messages. When consumer lag becomes zero, then stop the consumer
2. Delete the Kafka topic and recreate it with fewer partitions. 
3. Start the consumer 
4. Start the producer

But if the Kafka messages contain the keys, then orders of the message will no longer hold.

When decreasing and decreasing the partition of the topic, there is always a question of how consumers will adjust.

There are default properties in the kafka:-

auto.leader.rebalance.enable = true
metadata.max.age.ms = 300000ms ( 5 minutes )


Above properties show that metadata is refreshed forcefully in every five minutes even if Kafka cluster hasn't seen any partition leadership changes to proactively discover any new brokers or partitions.


The decreasing or increasing of the partitions will mark the metadata.updateNeeded=true. However, this will not trigger an update till the next metadata expiration (the default metadata.max.age.ms is 5*60*1000 ms)


After 5 minutes, metadata will be refreshed. When the updated offset and stored offset do not match, then the consumer offset position is reset to the earliest (zero ) or latest according to a configurable offset reset policy (auto.offset.reset). Typically, consumption starts either at the earliest offset or the latest offset ( default is the latest ). 

Resources:-
Confluent Kafka Consumer
Kafka: Increased partitions could not be assigned in next rebalance 
Choose and Change the Partition Count
Apache Kafka metadata.max.age.ms

Comments