I am using Kafka connector to consume data from Kafka topic.Kafka topic has 5 partitions.I want to configure FME job in such a way that Kafka connector consumes data from topic at partition level making partition as unit of parallelism for better speed. Is Kafka connector has inbuilt feature to handle partition parallelism hiding from developer. That way I can be assured of parallelism taking place with single kafka connector in the job. Please confirm.
configuring kafka connector to read data concurrently from multiple partitions of a topic
Best answer by gerhardatsafe
Hi @rbeerak,
The KafkaConnector will use all available partitions by default if the starting offset is set to earliest or latest.
To scale for faster throughput a workspace consuming from a Kafka topic can be published to FME Server and then submitted to run on multiple engines in parallel. As long as all running workspaces have the same consumer group id specified in the KafkaConnector they will not consume duplicate but spread a load of consumed messages among FME Server jobs. To make sure a job is restarted immediately in a case of a crash the RTC (run til canceled) option should be used in the advanced Run Workspace parameters (https://docs.safe.com/fme/html/FME_Server_Documentation/WebUI/Run-Workspace.htm).
I hope this answers your questions and helps!
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.