Amazon Managed Streaming for Apache Kafka

Developer Guide

Amazon Managed Streaming for Apache

Kafka

Amazon Managed Streaming for Apache Kafka Developer Guide

Amazon Managed Streaming for Apache Kafka: Developer Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service

that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any

manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are

the property of their respective owners, who may or may not be aﬃliated with, connected to, or

sponsored by Amazon.

Amazon Managed Streaming for Apache Kafka Developer Guide

Table of Contents

Welcome ........................................................................................................................................... 1

What is Amazon MSK? ................................................................................................................................ 1

Setting up ........................................................................................................................................ 3

Sign up for AWS ........................................................................................................................................... 3

Download libraries and tools ..................................................................................................................... 3

Get started ....................................................................................................................................... 5

Create an Amazon MSK cluster ................................................................................................................. 5

Create an IAM role to create topics ......................................................................................................... 6

Create a client machine .............................................................................................................................. 8

Create a topic in the Amazon MSK cluster ............................................................................................. 9

Produce and consume data ..................................................................................................................... 11

View Amazon MSK metrics ...................................................................................................................... 12

Delete the tutorial resources ................................................................................................................... 13

How it works .................................................................................................................................. 15

Create a provisioned MSK cluster ........................................................................................................... 16

Amazon MSK broker sizes ................................................................................................................... 16

Create a provisioned Amazon MSK cluster using the AWS Management Console ................... 18

Create a provisioned Amazon MSK cluster using the AWS CLI .................................................... 19

Create a provisioned Amazon MSK cluster with a custom Amazon MSK conﬁguration using

the AWS CLI ........................................................................................................................................... 21

Create a provisioned Amazon MSK cluster using the Amazon MSK API .................................... 22

Delete a provisioned Amazon MSK cluster ........................................................................................... 23

Delete a provisioned Amazon MSK cluster using the AWS Management Console ................... 23

Delete a provisioned Amazon MSK cluster using the AWS CLI .................................................... 23

Delete a provisioned Amazon MSK cluster using the API ............................................................ 24

Get the bootstrap brokers for an Amazon MSK cluster ..................................................................... 24

Get the bootstrap brokers using the AWS Management Console .............................................. 24

Get the bootstrap brokers using the AWS CLI ................................................................................ 25

Get the bootstrap brokers using the API ......................................................................................... 25

List clusters .................................................................................................................................................. 26

List clusters using the AWS Management Console ........................................................................ 26

List clusters using the AWS CLI ......................................................................................................... 26

List clusters using the API .................................................................................................................. 26

Metadata management ............................................................................................................................. 27

iii

Amazon Managed Streaming for Apache Kafka Developer Guide

ZooKeeper mode ................................................................................................................................... 27

KRaft mode ............................................................................................................................................ 29

Storage management for Amazon MSK clusters ................................................................................. 30

Tiered storage for Amazon MSK clusters ........................................................................................ 31

Scale up Amazon MSK broker storage ............................................................................................. 40

Provision storage throughput for brokers in a Amazon MSK cluster ......................................... 44

Update the Amazon MSK cluster broker size ....................................................................................... 49

Update the Amazon MSK cluster broker size ................................................................................. 49

Update the Amazon MSK cluster broker size using the AWS CLI ................................................ 50

Updating the broker size using the API ........................................................................................... 51

Update the conﬁguration of a cluster ................................................................................................... 52

Updating the conﬁguration of a cluster using the AWS CLI ........................................................ 52

Update the conﬁguration of a Amazon MSK cluster using the API ............................................ 54

Expand a Amazon MSK cluster ............................................................................................................... 54

Expand a Amazon MSK cluster using the AWS Management Console ....................................... 55

Expand a Amazon MSK cluster using the AWS CLI ........................................................................ 55

Expand a Amazon MSK cluster using the API ................................................................................. 57

Remove a broker ........................................................................................................................................ 57

Remove broker partitions ................................................................................................................... 58

Remove a broker with the Console ................................................................................................... 60

Remove a broker with the CLI ........................................................................................................... 61

Remove a broker with the API ........................................................................................................... 62

Update Amazon MSK cluster security .................................................................................................... 62

Update Amazon MSK cluster security settings using the AWS Management Console ............ 63

Update Amazon MSK cluster security settings using the AWS CLI ............................................. 63

Update Amazon MSK cluster security settings using the API ...................................................... 65

Reboot a broker for an Amazon MSK cluster ....................................................................................... 65

Reboot a broker for an Amazon MSK cluster using the AWS Management Console ............... 65

Reboot a broker for an Amazon MSK cluster using the AWS CLI ............................................... 66

Reboot a broker for an Amazon MSK cluster using the API ........................................................ 65

Patching ........................................................................................................................................................ 67

Tag a Amazon MSK cluster ...................................................................................................................... 68

Tag basics for Amazon MSK clusters ................................................................................................ 68

Tag basics for Amazon MSK clusters ................................................................................................ 69

Tag resources using the Amazon MSK API ...................................................................................... 70

Amazon Managed Streaming for Apache Kafka Developer Guide

Broker oﬄine and client failover ............................................................................................................ 70

Amazon MSK conﬁguration .......................................................................................................... 73

Custom Amazon MSK conﬁgurations ..................................................................................................... 73

Dynamic Amazon MSK conﬁguration ............................................................................................... 82

Topic-level Amazon MSK conﬁguration ........................................................................................... 83

Amazon MSK conﬁguration states .................................................................................................... 83

Default Amazon MSK conﬁguration ....................................................................................................... 83

Guidelines for tiered storage topic-level conﬁgurations ................................................................... 95

Amazon MSK conﬁguration operations ................................................................................................. 96

Create an Amazon MSK conﬁguration ............................................................................................. 97

Update an Amazon MSK conﬁguration ............................................................................................ 98

Delete an Amazon MSK conﬁguration ............................................................................................. 99

Describe an Amazon MSK conﬁguration .......................................................................................... 99

Get details about an Amazon MSK conﬁguration revision ........................................................ 100

List all Amazon MSK conﬁgurations in your account .................................................................. 101

MSK Serverless ............................................................................................................................ 103

Use MSK Serverless clusters .................................................................................................................. 104

Create a cluster ................................................................................................................................... 104

Create an IAM role for topics on MSK Serverless cluster ........................................................... 106

Create a client machine .................................................................................................................... 108

Create a topic ...................................................................................................................................... 110

Produce and consume data .............................................................................................................. 110

Delete resources .................................................................................................................................. 111

Conﬁguration ............................................................................................................................................ 112

Monitoring ................................................................................................................................................. 114

MSK Connect ................................................................................................................................ 116

Amazon MSK Connect beneﬁts ............................................................................................................ 116

Getting started ......................................................................................................................................... 118

Set up resources required for MSK Connect ................................................................................. 118

Create custom plugin ........................................................................................................................ 122

Create client machine and Apache Kafka topic ........................................................................... 123

Create connector ................................................................................................................................ 125

Send data to the MSK cluster .......................................................................................................... 126

Understand connectors ........................................................................................................................... 127

Understand connector capacity ....................................................................................................... 127

Create a connector ............................................................................................................................. 128

Amazon Managed Streaming for Apache Kafka Developer Guide

Connecting from connectors ............................................................................................................ 130

Create custom plugins ............................................................................................................................ 130

Understand MSK Connect workers ....................................................................................................... 131

Default worker conﬁguration ........................................................................................................... 131

Supported worker conﬁguration properties ................................................................................. 132

Create a custom conﬁguration ........................................................................................................ 134

Manage connector oﬀsets ................................................................................................................ 134

Conﬁguration providers .......................................................................................................................... 138

Considerations ..................................................................................................................................... 138

Create custom plugin and upload to S3 ....................................................................................... 138

Conﬁgure parameters and permissions for diﬀerent providers ................................................ 140

Create custom worker conﬁg ........................................................................................................... 145

Create the connector ......................................................................................................................... 146

IAM roles and policies ............................................................................................................................. 146

Understand service execution role .................................................................................................. 147

Example policies ................................................................................................................................. 149

Prevent cross-service confused deputy problem ......................................................................... 151

AWS managed policies ...................................................................................................................... 153

Use service-linked roles ..................................................................................................................... 156

Enable internet access ............................................................................................................................ 158

Set up a NAT gateway ....................................................................................................................... 158

Understand private DNS hostnames ................................................................................................... 160

Conﬁgure a VPC DHCP option ......................................................................................................... 161

Conﬁgure DNS attributes .................................................................................................................. 162

Handle connector creation failures ................................................................................................. 162

Logging ...................................................................................................................................................... 163

Preventing secrets from appearing in connector logs ................................................................ 164

Monitoring ................................................................................................................................................. 164

Examples .................................................................................................................................................... 167

Set up Amazon S3 sink connector .................................................................................................. 167

Use Debezium source connector ..................................................................................................... 169

Migrate to Amazon MSK Connect ........................................................................................................ 179

Understand internal topics used by Kafka Connect .................................................................... 179

State management ............................................................................................................................. 180

Migrate source connectors ............................................................................................................... 181

Migrate sink connectors .................................................................................................................... 182

Amazon Managed Streaming for Apache Kafka Developer Guide

Troubleshooting ....................................................................................................................................... 183

MSK Replicator ............................................................................................................................ 184

How Amazon MSK Replicator works .................................................................................................... 185

Data replication ................................................................................................................................... 185

Metadata replication .......................................................................................................................... 186

Topic name conﬁguration ................................................................................................................. 187

Set up source and target clusters ........................................................................................................ 189

Prepare the Amazon MSK source cluster ...................................................................................... 189

Prepare the Amazon MSK target cluster ....................................................................................... 192

Tutorial: Create an Amazon MSK Replicator ...................................................................................... 192

Considerations for creating an Amazon MSK Replicator ............................................................ 193

Create replicator with AWS console ............................................................................................... 196

Edit MSK Replicator settings ................................................................................................................. 203

Delete an MSK Replicator ...................................................................................................................... 204

Monitor replication .................................................................................................................................. 205

MSK Replicator metrics ..................................................................................................................... 205

Use replication to increase resiliency .................................................................................................. 215

Considerations for building multi-Region Apache Kafka applications ..................................... 215

Using active-active versus active-passive cluster topology ....................................................... 215

Create an active-passive Kafka cluster ........................................................................................... 216

Failover to the secondary Region ................................................................................................... 216

Perform a planned failover .............................................................................................................. 216

Perform an unplanned failover ....................................................................................................... 218

Perform failback ................................................................................................................................. 219

Create an active-active setup .......................................................................................................... 221

Migrate from one Amazon MSK cluster to another ......................................................................... 222

Migrate from self-managed MirrorMaker2 to MSK Replicator ....................................................... 223

Troubleshoot MSK Replicator ................................................................................................................ 223

MSK Replicator state goes from CREATING to FAILED ............................................................... 223

MSK Replicator appears stuck in the CREATING state ................................................................ 224

MSK Replicator is not replicating data or replicating only partial data ................................... 224

Message oﬀsets in the target cluster are diﬀerent than the source cluster ........................... 225

MSK Replicator is not syncing consumer groups oﬀsets or consumer group does not exist

on target cluster ................................................................................................................................. 225

Replication latency is high or keeps increasing ........................................................................... 226

Best practices for using MSK Replicator ............................................................................................. 228

vii

Amazon Managed Streaming for Apache Kafka Developer Guide

Managing MSK Replicator throughput using Kafka quotas ....................................................... 228

Setting cluster retention period ...................................................................................................... 229

Cluster states ............................................................................................................................... 230

Security ........................................................................................................................................ 232

Data protection ........................................................................................................................................ 233

Amazon MSK encryption ................................................................................................................... 234

Get started with Amazon MSK encryption ................................................................................... 235

Authentication and authorization for Amazon MSK APIs ................................................................ 238

How Amazon MSK works with IAM ................................................................................................ 238

Identity-based policy examples ....................................................................................................... 242

Service-linked roles ............................................................................................................................ 246

AWS managed policies ...................................................................................................................... 249

Troubleshoot Amazon MSK identity and access .......................................................................... 257

Authentication and authorization for Apache Kafka APIs ............................................................... 258

IAM access control .............................................................................................................................. 258

Mutual TLS authentication ............................................................................................................... 276

SASL/SCRAM authentication ............................................................................................................ 281

Apache Kafka ACLs ............................................................................................................................. 286

Changing security groups ...................................................................................................................... 288

Controlling access to Apache ZooKeeper ........................................................................................... 289

To place your Apache ZooKeeper nodes in a separate security group .................................... 289

Using TLS security with Apache ZooKeeper ................................................................................. 290

Amazon MSK logging .............................................................................................................................. 292

Broker logs ........................................................................................................................................... 292

CloudTrail events ................................................................................................................................ 295

Compliance validation ............................................................................................................................ 299

Resilience ................................................................................................................................................... 300

Infrastructure security ............................................................................................................................. 300

Connect to an MSK cluster ......................................................................................................... 302

Turn on public access .............................................................................................................................. 302

Access from within AWS ......................................................................................................................... 306

Amazon VPC peering ......................................................................................................................... 306

AWS Direct Connect ........................................................................................................................... 306

AWS Transit Gateway ......................................................................................................................... 307

VPN connections ................................................................................................................................. 307

REST proxies ........................................................................................................................................ 307

viii

Amazon Managed Streaming for Apache Kafka Developer Guide

Multiple Region multi-VPC connectivity ........................................................................................ 307

Single Region multi-VPC private connectivity .............................................................................. 307

EC2-Classic networking is retired .................................................................................................... 307

Multi-VPC private connectivity in a single Region ...................................................................... 308

Port information ................................................................................................................................. 322

Migrate to Amazon MSK Cluster ................................................................................................ 323

Migrate your Apache Kafka cluster to Amazon MSK ........................................................................ 323

Migrate from one Amazon MSK cluster to another ......................................................................... 324

MirrorMaker 1.0 best practices ............................................................................................................. 325

Advantages of MirrorMaker 2.* ............................................................................................................. 326

Monitor a cluster ......................................................................................................................... 328

Metrics for monitoring with CloudWatch ........................................................................................... 328

DEFAULT Level monitoring .............................................................................................................. 329

PER_BROKER Level monitoring ....................................................................................................... 336

PER_TOPIC_PER_BROKER Level monitoring ............................................................................... 344

PER_TOPIC_PER_PARTITION Level monitoring ........................................................................ 346

View metrics using CloudWatch ........................................................................................................... 347

Monitor consumer lags ........................................................................................................................... 348

Monitor with Prometheus ...................................................................................................................... 348

Enable open monitoring on new clusters ..................................................................................... 349

Enable open monitoring on existing clusters ............................................................................... 349

Set up a Prometheus host ................................................................................................................ 350

Use Prometheus metrics ................................................................................................................... 353

Store Prometheus metrics ................................................................................................................ 353

Use storage capacity alerts ................................................................................................................... 353

Monitor storage capacity alerts ...................................................................................................... 354

Cruise Control .............................................................................................................................. 355

Automated deployment template ........................................................................................................ 357

Quota ............................................................................................................................................ 358

Amazon MSK quota ................................................................................................................................. 358

MSK Replicator quotas ........................................................................................................................... 359

Quota for serverless clusters ................................................................................................................. 359

MSK Connect quota ................................................................................................................................. 361

Resources ...................................................................................................................................... 362

MSK integrations ......................................................................................................................... 363

Athena connector for Amazon MSK .................................................................................................... 363

Amazon Managed Streaming for Apache Kafka Developer Guide

Redshift integration for Amazon MSK ................................................................................................ 363

Firehose integration for Amazon MSK ................................................................................................ 363

Access EventBridge pipes ....................................................................................................................... 364

Apache Kafka versions ................................................................................................................ 366

Supported Apache Kafka versions ....................................................................................................... 366

Apache Kafka version 3.7.x (with production-ready tiered storage) ........................................ 367

Apache Kafka version 3.6.0 (with production-ready tiered storage) ........................................ 368

Amazon MSK version 3.5.1 ............................................................................................................... 368

Amazon MSK version 3.4.0 ............................................................................................................... 369

Amazon MSK version 3.3.2 ............................................................................................................... 369

Amazon MSK version 3.3.1 ............................................................................................................... 369

Amazon MSK version 3.1.1 ............................................................................................................... 370

Amazon MSK tiered storage version 2.8.2.tiered ......................................................................... 370

Apache Kafka version 2.5.1 .............................................................................................................. 370

Amazon MSK bug-ﬁx version 2.4.1.1 ............................................................................................. 371

Apache Kafka version 2.4.1 (use 2.4.1.1 instead) ........................................................................ 371

Amazon MSK version support ............................................................................................................... 372

Amazon MSK version support policy ............................................................................................. 372

Update the Apache Kafka version .................................................................................................. 373

Best practices for version upgrades ............................................................................................... 376

Troubleshoot Amazon MSK cluster ............................................................................................ 378

Volume replacement causes disk saturation due to replication overload .................................... 379

Consumer group stuck in PreparingRebalance state ................................................................. 379

Static membership protocol ............................................................................................................. 380

Identify and reboot ............................................................................................................................ 380

Error delivering broker logs to Amazon CloudWatch Logs ............................................................. 380

No default security group ...................................................................................................................... 381

Cluster appears stuck in the CREATING state .................................................................................... 381

Cluster state goes from CREATING to FAILED ................................................................................... 381

Cluster state is ACTIVE but producers cannot send data or consumers cannot receive data .... 382

AWS CLI doesn't recognize Amazon MSK ........................................................................................... 382

Partitions go oﬄine or replicas are out of sync ................................................................................ 382

Disk space is running low ...................................................................................................................... 382

Memory running low ............................................................................................................................... 382

Producer gets NotLeaderForPartitionException ................................................................................ 383

Under-replicated partitions (URP) greater than zero ....................................................................... 383

Amazon Managed Streaming for Apache Kafka Developer Guide

Cluster has topics called __amazon_msk_canary and __amazon_msk_canary_state ................. 383

Partition replication fails ........................................................................................................................ 383

Unable to access cluster that has public access turned on ............................................................. 383

Unable to access cluster from within AWS: Networking issues ...................................................... 384

Amazon EC2 client and MSK cluster in the same VPC ............................................................... 385

Amazon EC2 client and MSK cluster in diﬀerent VPCs ............................................................... 385

On-premises client ............................................................................................................................. 386

AWS Direct Connect ........................................................................................................................... 386

Failed authentication: Too many connects ......................................................................................... 386

MSK Serverless: Cluster creation fails ................................................................................................. 386

Best practices ............................................................................................................................... 388

Right-size your cluster: Number of partitions per broker .............................................................. 388

Right-size your cluster: Number of brokers per cluster ................................................................... 389

Optimize cluster throughput for m5.4xl, m7g.4xl or larger instances .......................................... 389

Use latest Kafka AdminClient to avoid topic ID mismatch issue .................................................... 390

Build highly available clusters .............................................................................................................. 391

Monitor CPU usage .................................................................................................................................. 391

Monitor disk space ................................................................................................................................... 393

Adjust data retention parameters ........................................................................................................ 393

Speeding up log recovery after unclean shutdown .......................................................................... 394

Monitor Apache Kafka memory ............................................................................................................ 394

Don't add non-MSK brokers .................................................................................................................. 395

Enable in-transit encryption .................................................................................................................. 395

Reassign partitions .................................................................................................................................. 395

Document history ........................................................................................................................ 396

Amazon Managed Streaming for Apache Kafka Developer Guide

Welcome to the Amazon MSK Developer Guide

Welcome to the Amazon MSK Developer Guide. The following topics can help you get started using

this guide, based on what you're trying to do.

• Create an Amazon MSK cluster by following the Get started using Amazon MSK tutorial.

• Dive deeper into the functionality of Amazon MSK in Amazon MSK: How it works.

• Run Apache Kafka without having to manage and scale cluster capacity with What is MSK

Serverless?.

• Use Understand MSK Connect to stream data to and from your Apache Kafka cluster.

• Use What is Amazon MSK Replicator? to reliably replicate data across Amazon MSK clusters in

diﬀerent or the same AWS region(s).

For highlights, product details, and pricing, see the service page for Amazon MSK.

What is Amazon MSK?

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that

enables you to build and run applications that use Apache Kafka to process streaming data.

Amazon MSK provides the control-plane operations, such as those for creating, updating,

and deleting clusters. It lets you use Apache Kafka data-plane operations, such as those for

producing and consuming data. It runs open-source versions of Apache Kafka. This means existing

applications, tooling, and plugins from partners and the Apache Kafka community are supported

without requiring changes to application code. You can use Amazon MSK to create clusters that

use any of the Apache Kafka versions listed under the section called “Supported Apache Kafka

versions”.

These components describe the architecture of Amazon MSK:

• Broker nodes — When creating an Amazon MSK cluster, you specify how many broker nodes

you want Amazon MSK to create in each Availability Zone. The minimum is one broker per

Availability Zone. Each Availability Zone has its own virtual private cloud (VPC) subnet.

• ZooKeeper nodes — Amazon MSK also creates the Apache ZooKeeper nodes for you. Apache

ZooKeeper is an open-source server that enables highly reliable distributed coordination.

• KRaft controllers —The Apache Kafka community developed KRaft to replace Apache

ZooKeeper for metadata management in Apache Kafka clusters. In KRaft mode, cluster metadata

What is Amazon MSK? 1

Amazon Managed Streaming for Apache Kafka Developer Guide

is propagated within a group of Kafka controllers, which are part of the Kafka cluster, instead of

across ZooKeeper nodes. KRaft controllers are included at no additional cost to you, and require

no additional setup or management from you.

Note

From Apache Kafka version 3.7.x on MSK, you can create clusters that use KRaft mode

instead of ZooKeeper mode.

• Producers, consumers, and topic creators — Amazon MSK lets you use Apache Kafka data-

plane operations to create topics and to produce and consume data.

• Cluster Operations You can use the AWS Management Console, the AWS Command Line

Interface (AWS CLI), or the APIs in the SDK to perform control-plane operations. For example,

you can create or delete an Amazon MSK cluster, list all the clusters in an account, view the

properties of a cluster, and update the number and type of brokers in a cluster.

Amazon MSK detects and automatically recovers from the most common failure scenarios for

clusters so that your producer and consumer applications can continue their write and read

operations with minimal impact. When Amazon MSK detects a broker failure, it mitigates the

failure or replaces the unhealthy or unreachable broker with a new one. In addition, where possible,

it reuses the storage from the older broker to reduce the data that Apache Kafka needs to replicate.

Your availability impact is limited to the time required for Amazon MSK to complete the detection

and recovery. After a recovery, your producer and consumer apps can continue to communicate

with the same broker IP addresses that they used before the failure.

What is Amazon MSK? 2

Amazon Managed Streaming for Apache Kafka Developer Guide

Setting up Amazon MSK

Before you use Amazon MSK for the ﬁrst time, complete the following tasks.

Tasks

• Sign up for AWS

• Download libraries and tools

When you sign up for AWS, your Amazon Web Services account is automatically signed up for all

services in AWS, including Amazon MSK. You are charged only for the services that you use.

If you have an AWS account already, skip to the next task. If you don't have an AWS account, use

the following procedure to create one.

To sign up for an Amazon Web Services account

1. Open https://portal.aws.amazon.com/billing/signup.

2. Follow the online instructions.

Part of the sign-up procedure involves receiving a phone call and entering a veriﬁcation code

on the phone keypad.

When you sign up for an AWS account, an AWS account root user is created. The root user

has access to all AWS services and resources in the account. As a security best practice, assign

administrative access to a user, and use only the root user to perform tasks that require root

user access.

Download libraries and tools

The following libraries and tools can help you work with Amazon MSK:

• The AWS Command Line Interface (AWS CLI) supports Amazon MSK. The AWS CLI enables

you to control multiple Amazon Web Services from the command line and automate them

through scripts. Upgrade your AWS CLI to the latest version to ensure that it has support for

the Amazon MSK features that are documented in this user guide. For detailed instructions on

Amazon Managed Streaming for Apache Kafka Developer Guide

how to upgrade the AWS CLI, see Installing the AWS Command Line Interface. After you install

the AWS CLI, you must conﬁgure it. For information on how to conﬁgure the AWS CLI, see aws

conﬁgure.

• The Amazon Managed Streaming for Kafka API Reference documents the API operations that

Amazon MSK supports.

• The Amazon Web Services SDKs for Go, Java, JavaScript, .NET, Node.js, PHP, Python, and Ruby

include Amazon MSK support and samples.

Download libraries and tools 4

Amazon Managed Streaming for Apache Kafka Developer Guide

Get started using Amazon MSK

This tutorial shows you an example of how you can create an MSK cluster, produce and consume

data, and monitor the health of your cluster using metrics. This example doesn't represent all

the options you can choose when you create an MSK cluster. In diﬀerent parts of this tutorial, we

choose default options for simplicity. This doesn't mean that they're the only options that work for

setting up an MSK cluster or client instances.

Topics

• Step 1: Create an Amazon MSK cluster

• Step 2: Create an IAM role granting access to create topics on the Amazon MSK cluster

• Step 3: Create a client machine

• Step 4: Create a topic in the Amazon MSK cluster

• Step 5: Produce and consume data

• Step 6: Use Amazon CloudWatch to view Amazon MSK metrics

• Step 7: Delete the AWS resources created for this tutorial

Step 1: Create an Amazon MSK cluster

In this step of Getting Started Using Amazon MSK, you create an Amazon MSK cluster.

To create an Amazon MSK cluster using the AWS Management Console

1. Sign in to the AWS Management Console, and open the Amazon MSK console at https://

console.aws.amazon.com/msk/home?region=us-east-1#/home/.

2. Choose Create cluster.

3. For Creation method, leave the Quick create option selected. The Quick create option lets

you create a cluster with default settings.

4. For Cluster name, enter a descriptive name for your cluster. For example,

MSKTutorialCluster.

5. For General cluster properties, choose Provisioned as the Cluster type.

6. From the table under All cluster settings, copy the values of the following settings and save

them because you need them later in this tutorial:

Create an Amazon MSK cluster 5

Amazon Managed Streaming for Apache Kafka Developer Guide

• VPC

• Subnets

• Security groups associated with VPC

7. Choose Create cluster.

8. Check the cluster Status on the Cluster summary page. The status changes from Creating to

Active as Amazon MSK provisions the cluster. When the status is Active, you can connect to

the cluster. For more information about cluster status, see Understand cluster states.

Next Step

Step 2: Create an IAM role granting access to create topics on the Amazon MSK cluster

Step 2: Create an IAM role granting access to create topics on

the Amazon MSK cluster

In this step, you perform two tasks. The ﬁrst task is to create an IAM policy that grants access to

create topics on the cluster and to send data to those topics. The second task is to create an IAM

role and associate this policy with it. In a later step, you create a client machine that assumes this

role and uses it to create a topic on the cluster and to send data to that topic.

To create an IAM policy that makes it possible to create topics and write to them

1. Open the IAM console at https://console.aws.amazon.com/iam/.

2. On the navigation pane, choose Policies.

3. Choose Create Policy.

4. Choose the JSON tab, then replace the JSON in the editor window with the following JSON.

Replace region with the code of the AWS region where you created your cluster. Replace

Account-ID with your account ID. Replace MSKTutorialCluster with the name of your

cluster.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

Create an IAM role to create topics 6

Amazon Managed Streaming for Apache Kafka Developer Guide

"Action": [

"kafka-cluster:Connect",

"kafka-cluster:AlterCluster",

"kafka-cluster:DescribeCluster"

"Resource": [

"arn:aws:kafka:region:Account-ID:cluster/MSKTutorialCluster/*"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:*Topic*",

"kafka-cluster:WriteData",

"kafka-cluster:ReadData"

"Resource": [

"arn:aws:kafka:region:Account-ID:topic/MSKTutorialCluster/*"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:AlterGroup",

"kafka-cluster:DescribeGroup"

"Resource": [

"arn:aws:kafka:region:Account-ID:group/MSKTutorialCluster/*"

]

}

]

}

For instructions on how to write secure policies, see the section called “IAM access control”.

5. Choose Next: Tags.

6. Choose Next: Review.

7. For the policy name, enter a descriptive name, such as msk-tutorial-policy.

8. Choose Create policy.

Create an IAM role to create topics 7

Amazon Managed Streaming for Apache Kafka Developer Guide

To create an IAM role and attach the policy to it

1. On the navigation pane, choose Roles.

2. Choose Create role.

3. Under Common use cases, choose EC2, then choose Next: Permissions.

4. In the search box, enter the name of the policy that you previously created for this tutorial.

Then select the box to the left of the policy.

5. Choose Next: Tags.

6. Choose Next: Review.

7. For the role name, enter a descriptive name, such as msk-tutorial-role.

8. Choose Create role.

Next Step

Step 3: Create a client machine

In this step of Get Started Using Amazon MSK, you create a client machine. You use this client

machine to create a topic that produces and consumes data. For simplicity, you'll create this client

machine in the VPC that is associated with the MSK cluster so that the client can easily connect to

the cluster.

To create a client machine

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. Choose Launch instances.

Enter a Name for your client machine, such as MSKTutorialClient.

4. Leave Amazon Linux 2 AMI (HVM) - Kernel 5.10, SSD Volume Type selected for Amazon

Machine Image (AMI) type.

5. Leave the t2.micro instance type selected.

Under Key pair (login), choose Create a new key pair. Enter MSKKeyPair for Key pair name,

and then choose Download Key Pair. Alternatively, you can use an existing key pair.

7. Expand the Advanced details section and choose the IAM role that you created in Step 2:

Create an IAM role.

Create a client machine 8

Amazon Managed Streaming for Apache Kafka Developer Guide

8. Choose Launch instance.

9. Choose View Instances. Then, in the Security Groups column, choose the security group that

is associated with your new instance. Copy the ID of the security group, and save it for later.

10. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

11. In the navigation pane, choose Security Groups. Find the security group whose ID you saved in

the section called “Create an Amazon MSK cluster”.

12. In the Inbound Rules tab, choose Edit inbound rules.

13. Choose Add rule.

14. In the new rule, choose All traﬃc in the Type column. In the second ﬁeld in the Source

column, select the security group of your client machine. This is the group whose name you

saved after you launched the client machine instance.

15. Choose Save rules. Now the cluster's security group can accept traﬃc that comes from the

client machine's security group.

Next Step

Step 4: Create a topic in the Amazon MSK cluster

In this step of Getting Started Using Amazon MSK, you install Apache Kafka client libraries and

tools on the client machine, and then you create a topic.

Warning

Apache Kafka version numbers used in this tutorial are examples only. We recommend that

you use the same version of the client as your MSK cluster version. An older client version

may be missing certain features and critical bug ﬁxes.

To ﬁnd the version of your MSK cluster

1. Go to https://eu-west-2.console.aws.amazon.com/msk/

2. Select the MSK cluster.

3. Note the version of Apache Kafka used on the cluster.

Create a topic in the Amazon MSK cluster 9

Amazon Managed Streaming for Apache Kafka Developer Guide

4. Replace instances of Amazon MSK version numbers in this tutorial with the version obtained in

Step 3.

To create a topic on the client machine

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. In the navigation pane, choose Instances. Then select the check box beside the name of the

client machine that you created in Step 3: Create a client machine.

3. Choose Actions, and then choose Connect. Follow the instructions in the console to connect to

your client machine.

4. Install Java on the client machine by running the following command:

sudo yum -y install java-11

5. Run the following command to download Apache Kafka.

wget https://archive.apache.org/dist/kafka/{YOUR MSK VERSION}/kafka_2.13-{YOUR MSK

VERSION}.tgz

Note

If you want to use a mirror site other than the one used in this command, you can

choose a diﬀerent one on the Apache website.

6. Run the following command in the directory where you downloaded the TAR ﬁle in the

previous step.

tar -xzf kafka_2.13-{YOUR MSK VERSION}.tgz

Go to the kafka_2.13-{YOUR MSK VERSION}/libs directory, then run the following

command to download the Amazon MSK IAM JAR ﬁle. The Amazon MSK IAM JAR makes it

possible for the client machine to access the cluster.

wget https://github.com/aws/aws-msk-iam-auth/releases/download/v1.1.1/aws-msk-iam-

auth-1.1.1-all.jar

Go to the kafka_2.13-{YOUR MSK VERSION}/bin directory. Copy the following property

settings and paste them into a new ﬁle. Name the ﬁle client.properties and save it.

Create a topic in the Amazon MSK cluster 10

Amazon Managed Streaming for Apache Kafka Developer Guide

security.protocol=SASL_SSL

sasl.mechanism=AWS_MSK_IAM

sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;

sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler

9. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

10. Wait for the status of your cluster to become Active. This might take several minutes. After

the status becomes Active, choose the cluster name. This takes you to a page containing the

cluster summary.

11. Choose View client information.

12. Copy the connection string for the private endpoint.

You will get three endpoints for each of the brokers. You only need one broker endpoint for

the following step.

13.

Run the following command, replacing BootstrapServerString with one of the broker

endpoints that you obtained in the previous step.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --bootstrap-server

BootstrapServerString --command-config client.properties --replication-factor 3 --

partitions 1 --topic MSKTutorialTopic

If the command succeeds, you see the following message: Created topic

MSKTutorialTopic.

Next Step

Step 5: Produce and consume data

In this step of Get Started Using Amazon MSK, you produce and consume data.

To produce and consume messages

Run the following command to start a console producer. Replace BootstrapServerString

with the plaintext connection string that you obtained in Create a topic. For instructions on

Produce and consume data 11

Amazon Managed Streaming for Apache Kafka Developer Guide

how to retrieve this connection string, see Getting the bootstrap brokers for an Amazon MSK

cluster.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --

broker-list BootstrapServerString --producer.config client.properties --

topic MSKTutorialTopic

2. Enter any message that you want, and press Enter. Repeat this step two or three times. Every

time you enter a line and press Enter, that line is sent to your Apache Kafka cluster as a

separate message.

3. Keep the connection to the client machine open, and then open a second, separate connection

to that machine in a new window.

In the following command, replace BootstrapServerString with the plaintext connection

string that you saved earlier. Then, to create a console consumer, run the following command

with your second connection to the client machine.

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-

server BootstrapServerString --consumer.config client.properties --

topic MSKTutorialTopic --from-beginning

You start seeing the messages you entered earlier when you used the console producer

command.

5. Enter more messages in the producer window, and watch them appear in the consumer

window.

Next Step

Step 6: Use Amazon CloudWatch to view Amazon MSK metrics

In this step of Getting Started Using Amazon MSK, you look at the Amazon MSK metrics in Amazon

CloudWatch.

To view Amazon MSK metrics in CloudWatch

1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

2. In the navigation pane, choose Metrics.

View Amazon MSK metrics 12

Amazon Managed Streaming for Apache Kafka Developer Guide

3. Choose the All metrics tab, and then choose AWS/Kafka.

4. To view broker-level metrics, choose Broker ID, Cluster Name. For cluster-level metrics, choose

Cluster Name.

5. (Optional) In the graph pane, select a statistic and a time period, and then create a

CloudWatch alarm using these settings.

Next Step

Step 7: Delete the AWS resources created for this tutorial

In the ﬁnal step of Getting Started Using Amazon MSK, you delete the MSK cluster and the client

machine that you created for this tutorial.

To delete the resources using the AWS Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose the name of your cluster. For example, MSKTutorialCluster.

3. Choose Actions, then choose Delete.

4. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

5. Choose the instance that you created for your client machine, for example,

MSKTutorialClient.

6. Choose Instance state, then choose Terminate instance.

To delete the IAM policy and role

1. Open the IAM console at https://console.aws.amazon.com/iam/.

2. On the navigation pane, choose Roles.

3. In the search box, enter the name of the IAM role that you created for this tutorial.

4. Choose the role. Then choose Delete role, and conﬁrm the deletion.

5. On the navigation pane, choose Policies.

6. In the search box, enter the name of the policy that you created for this tutorial.

7. Choose the policy to open its summary page. On the policy's Summary page, choose Delete

policy.

Delete the tutorial resources 13

Amazon Managed Streaming for Apache Kafka Developer Guide

8. Choose Delete.

Delete the tutorial resources 14

Amazon Managed Streaming for Apache Kafka Developer Guide

Amazon MSK: How it works

An Amazon MSK cluster is the primary Amazon MSK resource that you can create in your account.

The topics in this section describe how to perform common Amazon MSK operations. For a list of

all the operations that you can perform on an MSK cluster, see the following:

• The AWS Management Console

• The Amazon MSK API Reference

• The Amazon MSK CLI Command Reference

Topics

• Create a provisioned Amazon MSK cluster

• Delete a provisioned Amazon MSK cluster

• Get the bootstrap brokers for an Amazon MSK cluster

• List Amazon MSK clusters

• Metadata management

• Storage management for Amazon MSK clusters

• Update the Amazon MSK cluster broker size

• Update the conﬁguration of an Amazon MSK cluster

• Expand the number of brokers in an Amazon MSK cluster

• Remove a broker from an Amazon MSK cluster

• Update security settings of a Amazon MSK cluster

• Reboot a broker for an Amazon MSK cluster

• Impact of broker restarts during patching and other maintenance

• Tag an Amazon MSK cluster

• Broker oﬄine and client failover

Amazon Managed Streaming for Apache Kafka Developer Guide

Create a provisioned Amazon MSK cluster

Important

You can't change the VPC for a provisioned MSK cluster after you create the cluster.

Before you can create a provisioned Amazon MSK cluster, you need to have an Amazon Virtual

Private Cloud (VPC) and set up subnets within that VPC.

You need two subnets in two diﬀerent Availability Zones in the US West (N. California) Region. In

all other Regions where Amazon MSK is available, you can specify either two or three subnets. Your

subnets must all be in diﬀerent Availability Zones. When you create a provisioned MSK cluster,

Amazon MSK distributes the broker nodes evenly over the subnets that you specify.

Topics

• Amazon MSK broker sizes

• Create a provisioned Amazon MSK cluster using the AWS Management Console

• Create a provisioned Amazon MSK cluster using the AWS CLI

• Create a provisioned Amazon MSK cluster with a custom Amazon MSK conﬁguration using the

AWS CLI

• Create a provisioned Amazon MSK cluster using the Amazon MSK API

Amazon MSK broker sizes

When you create an Amazon MSK cluster, you specify the size of brokers that you want it to have.

Amazon MSK supports the following broker sizes:

• kafka.t3.small

• kafka.m5.large, kafka.m5.xlarge, kafka.m5.2xlarge, kafka.m5.4xlarge, kafka.m5.8xlarge,

kafka.m5.12xlarge, kafka.m5.16xlarge, kafka.m5.24xlarge

• kafka.m7g.large, kafka.m7g.xlarge, kafka.m7g.2xlarge, kafka.m7g.4xlarge, kafka.m7g.8xlarge,

kafka.m7g.12xlarge, kafka.m7g.16xlarge

Create a provisioned MSK cluster 16

Amazon Managed Streaming for Apache Kafka Developer Guide

M7g brokers use AWS Graviton processors (custom Arm-based processors built by Amazon Web

Services). M7g brokers oﬀer improved price performance relative to comparable M5 instances. M7g

brokers consume less power than comparable M5 instances.

M7g Graviton brokers are not available in these regions: CDG (Paris), CGK (Jakarta), CPT (Cape

Town), DXB (Dubai), HKG (Hong Kong), KIX (Osaka), LHR (London), MEL (Melbourne), MXP (Milan),

OSU (US-East), PDT (US-West), TLV (Tel Aviv), YYC (Calgary), ZRH (Zurich).

Amazon MSK supports M7g brokers on provisioned MSK cluster running one of the following Kafka

versions:

• 2.8.2.tiered

• 3.3.2

• 3.4.0

• 3.5.1

• 3.6.0 with tiered storage

• 3.7.x

• 3.7.x.kraft

M7g and M5 brokers have higher baseline throughput performance than T3 brokers and are

recommended for production workloads. M7g and M5 brokers can also have more partitions

per broker than T3 brokers. Use M7g or M5 brokers if you are running larger production-grade

workloads or require a greater number of partitions. To learn more about M7g and M5 instance

sizes, see Amazon EC2 General Purpose Instances.

T3 brokers have the ability to use CPU credits to temporarily burst performance. Use T3 brokers for

low-cost development, if you are testing small to medium streaming workloads, or if you have low-

throughput streaming workloads that experience temporary spikes in throughput. We recommend

that you run a proof-of-concept test to determine if T3 brokers are suﬃcient for production or

critical workload. To learn more about T3 broker sizes, see Amazon EC2 T3 Instances.

For more information on how to choose broker sizes, see Best practices.

Amazon MSK broker sizes 17

Amazon Managed Streaming for Apache Kafka Developer Guide

Create a provisioned Amazon MSK cluster using the AWS Management

Console

This process describes the common task of creating a provisioned Amazon MSK cluster using

custom create options in the AWS Management Console. You can select other options in the AWS

Management Console to create a serverless cluster.

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose Create cluster.

3. For Cluster creation method, choose Custom create.

4. Specify a Cluster name that is unique and no more than 64 characters.

5. For Cluster type, choose Provisioned, which allows you to specify the number of brokers,

broker size, and cluster storage capacity.

6. Select the Apache Kafka version that you want to run on the brokers. To see a comparison of

Amazon MSK features that are supported by each Apache Kafka version, select View version

compatibility.

7. Depending on the Apache Kafka version you select, you may have the option to choose the

cluster’s Metadata mode: ZooKeeper or KRaft.

8. Select a broker size to use for the cluster based on the cluster’s compute, memory, and

storage needs. See ???,

9. Select the Number of zones across which brokers are distributed.

10. Specify the number of brokers you want MSK to create in each Availability Zone. The minimum

is one broker per Availability Zone and the maximum is 30 brokers per cluster for ZooKeeper-

based clusters and 60 brokers per cluster for KRaft-based clusters.

11. Select the initial amount of Storage you want your cluster to have. You can't decrease storage

capacity after you create the cluster.

12. Depending on the broker size (instance size) you selected, you can specify Provisioned

storage throughput per broker. To enable this option, choose broker size (instance size)

kafka.m5.4xlarge or larger for x86, and kafka.m7g.2xlarge or larger for Graviton-based

instances. See ???.

13. Select a Cluster storage mode option, either EBS storage only or tiered storage and EBS

storage.

Create a provisioned Amazon MSK cluster using the AWS Management Console 18

Amazon Managed Streaming for Apache Kafka Developer Guide

14. If you want to create and use a custom Cluster conﬁguration (or if you already have a

cluster conﬁguration saved), choose a conﬁguration. Otherwise, you can create the cluster

using the Amazon MSK default cluster conﬁguration. For information about Amazon MSK

conﬁgurations, see Amazon MSK conﬁguration.

15. Select Next.

16. For Networking settings, choose the VPC you want to use for the cluster.

17. Based on the Number of zones you previously selected, specify the Availability Zones and

subnets where brokers will deploy. The subnets must be in diﬀerent Availability Zones.

18. You can select one or more security groups that you want to give access to your cluster (for

example, the security groups of client machines). If you specify security groups that are shared

with you, you must ensure that you have permissions to use them. Speciﬁcally, you need the

ec2:DescribeSecurityGroups permission. Connecting to an MSK cluster.

19. Select Next.

20. Select the cluster's Access control methods and Encryption settings for encrypting data as

it transits between clients and brokers. For more information, see the section called “Amazon

MSK encryption in transit”.

21. Choose the kind of KMS key that you want to use for encrypting data at rest. For more

information, see the section called “Amazon MSK encryption at rest”.

22. Select Next.

23. Choose the Monitoring and tags you want. This determines the set of metrics you get.

For more information, see Monitor a cluster. Amazon CloudWatch, Prometheus, Broker log

delivery, or Cluster tags, then select Next.

24. Review the settings for your cluster. You can go back and change settings by selecting

Previous to go back to the previous console screen, or Edit to change speciﬁc cluster settings.

If the settings are correct, select Create cluster.

25. Check the cluster Status on the Cluster summary page. The status changes from Creating to

Active as Amazon MSK provisions the cluster. When the status is Active, you can connect to

the cluster. For more information about cluster status, see Understand cluster states.

Create a provisioned Amazon MSK cluster using the AWS CLI

Copy the following JSON and save it to a ﬁle. Name the ﬁle brokernodegroupinfo.json.

Replace the subnet IDs in the JSON with the values that correspond to your subnets. These

Create a provisioned Amazon MSK cluster using the AWS CLI 19

Amazon Managed Streaming for Apache Kafka Developer Guide

subnets must be in diﬀerent Availability Zones. Replace "Security-Group-ID" with

the ID of one or more security groups of the client VPC. Clients associated with these

security groups get access to the cluster. If you specify security groups that were shared

with you, you must ensure that you have permissions to them. Speciﬁcally, you need the

ec2:DescribeSecurityGroups permission. For an example, see Amazon EC2: Allows

Managing Amazon EC2 Security Groups Associated With a Speciﬁc VPC, Programmatically and

in the Console. Finally, save the updated JSON ﬁle on the computer where you have the AWS

CLI installed.

{

"InstanceType": "kafka.m5.large",

"ClientSubnets": [

"Subnet-1-ID",

"Subnet-2-ID"

"SecurityGroups": [

"Security-Group-ID"

]

}

Important

Specify exactly two subnets if you are using the US West (N. California) Region. For

other Regions where Amazon MSK is available, you can specify either two or three

subnets. The subnets that you specify must be in distinct Availability Zones. When you

create a provisioned MSK cluster, Amazon MSK distributes the broker nodes evenly

across the subnets that you specify.

2. Run the following AWS CLI command in the directory where you saved the

brokernodegroupinfo.json ﬁle, replacing "Your-Cluster-Name" with a name of

your choice. For "Monitoring-Level", you can specify one of the following three values:

DEFAULT, PER_BROKER, or PER_TOPIC_PER_BROKER. For information about these three

diﬀerent levels of monitoring, see ???. The enhanced-monitoring parameter is optional.

If you don't specify it in the create-cluster command, you get the DEFAULT level of

monitoring.

Create a provisioned Amazon MSK cluster using the AWS CLI 20

Amazon Managed Streaming for Apache Kafka Developer Guide

aws kafka create-cluster --cluster-name "Your-Cluster-Name" --broker-node-group-

info file://brokernodegroupinfo.json --kafka-version "2.8.1" --number-of-broker-

nodes 3 --enhanced-monitoring "Monitoring-Level"

The output of the command looks like the following JSON:

{

"ClusterArn": "...",

"ClusterName": "AWSKafkaTutorialCluster",

"State": "CREATING"

}

Note

The create-cluster command might return an error stating that one or more

subnets belong to unsupported Availability Zones. When this happens, the error

indicates which Availability Zones are unsupported. Create subnets that don't use the

unsupported Availability Zones and try the create-cluster command again.

Save the value of the ClusterArn key because you need it to perform other actions on your

cluster.

Run the following command to check your cluster STATE. The STATE value changes from

CREATING to ACTIVE as Amazon MSK provisions the cluster. When the state is ACTIVE, you

can connect to the cluster. For more information about cluster status, see Understand cluster

states.

aws kafka describe-cluster --cluster-arn <your-cluster-ARN>

Create a provisioned Amazon MSK cluster with a custom Amazon MSK

conﬁguration using the AWS CLI

For information about custom Amazon MSK conﬁgurations and how to create them, see Amazon

MSK conﬁguration.

Create a provisioned Amazon MSK cluster with a custom Amazon MSK conﬁguration using the AWS CLI 21

Amazon Managed Streaming for Apache Kafka Developer Guide

Save the following JSON to a ﬁle, replacing configuration-arn with the ARN of the

conﬁguration that you want to use to create the cluster.

{

"Arn": configuration-arn,

"Revision": 1

}

Run the create-cluster command and use the configuration-info option to point to

the JSON ﬁle you saved in the previous step. The following is an example.

aws kafka create-cluster --cluster-name ExampleClusterName --broker-node-group-

info file://brokernodegroupinfo.json --kafka-version "2.8.1" --number-of-broker-

nodes 3 --enhanced-monitoring PER_TOPIC_PER_BROKER --configuration-info file://

configuration.json

The following is an example of a successful response after running this command.

{

"ClusterArn": "arn:aws:kafka:us-east-1:123456789012:cluster/

CustomConfigExampleCluster/abcd1234-abcd-dcba-4321-a1b2abcd9f9f-2",

"ClusterName": "CustomConfigExampleCluster",

"State": "CREATING"

}

Create a provisioned Amazon MSK cluster using the Amazon MSK API

The Amazon MSK API allows you to programmatically create and manage your provisioned Amazon

MSK cluster as part of automated infrastructure provisioning or deployment scripts.

To create a provisioned Amazon MSK cluster using the API, see CreateCluster.

Create a provisioned Amazon MSK cluster using the Amazon MSK API 22

Amazon Managed Streaming for Apache Kafka Developer Guide

Delete a provisioned Amazon MSK cluster

Note

If your provisioned Amazon MSK cluster has an auto-scaling policy, we recommend that

you remove the policy before you delete the cluster. For more information, see Automatic

scaling for Amazon MSK clusters.

Topics

• Delete a provisioned Amazon MSK cluster using the AWS Management Console

• Delete a provisioned Amazon MSK cluster using the AWS CLI

• Delete a provisioned Amazon MSK cluster using the API

Delete a provisioned Amazon MSK cluster using the AWS Management

Console

This process describes how to delete a provisioned Amazon MSK cluster using the AWS

Management Console. Before you delete a MSK cluster, ensure that you have a backup of any

important data stored in the cluster and that there aren't any scheduled tasks dependant on the

cluster. You can't undo a MSK cluster deletion.

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose the MSK cluster that you want to delete by selecting the check box next to it.

3. Choose Delete, and then conﬁrm deletion.

Delete a provisioned Amazon MSK cluster using the AWS CLI

This process describes how to delete a provisioned Amazon MSK cluster using the AWS CLI. Before

you delete a MSK cluster, ensure that you have a backup of any important data stored in the cluster

and that there aren't any scheduled tasks dependant on the cluster. You can't undo a MSK cluster

deletion.

Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that

you obtained when you created your cluster. If you don't have the ARN for your cluster, you can

ﬁnd it by listing all clusters. For more information, see the section called “List clusters”.

Delete a provisioned Amazon MSK cluster 23

Amazon Managed Streaming for Apache Kafka Developer Guide

aws kafka delete-cluster --cluster-arn ClusterArn

Delete a provisioned Amazon MSK cluster using the API

The Amazon MSK API allows you to programmatically create and manage your provisioned Amazon

MSK cluster as part of automated infrastructure provisioning or deployment scripts. This process

describes how to delete a provisioned Amazon MSK cluster using the Amazon MSK API. Before you

delete a Amazon MSK cluster, ensure that you have a backup of any important data stored in the

cluster and that there aren't any scheduled tasks dependant on the cluster. You can't undo a MSK

cluster deletion.

To delete a cluster using the API, see DeleteCluster.

Get the bootstrap brokers for an Amazon MSK cluster

The bootstrap brokers refer to the list of brokers that an Apache Kafka client can use to connect

to an Amazon MSK cluster. This list may not include all the brokers in the cluster. You can get

bootstrap brokers using the AWS Management Console, AWS CLI, or Amazon MSK API.

Topics

• Get the bootstrap brokers using the AWS Management Console

• Get the bootstrap brokers using the AWS CLI

• Get the bootstrap brokers using the API

Get the bootstrap brokers using the AWS Management Console

This process describes how to get bootstrap brokers for a cluster using the AWS Management

Console. The term bootstrap brokers refers to a list of brokers that an Apache Kafka client can use

as a starting point to connect to the cluster. This list doesn't necessarily include all of the brokers in

a cluster.

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. The table shows all the clusters for the current region under this account. Choose the name of

a cluster to view its description.

3. On the Cluster summary page, choose View client information. This shows you the bootstrap

brokers, as well as the Apache ZooKeeper connection string.

Delete a provisioned Amazon MSK cluster using the API 24

Amazon Managed Streaming for Apache Kafka Developer Guide

Get the bootstrap brokers using the AWS CLI

Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that

you obtained when you created your cluster. If you don't have the ARN for your cluster, you can

ﬁnd it by listing all clusters. For more information, see the section called “List clusters”.

aws kafka get-bootstrap-brokers --cluster-arn ClusterArn

For an MSK cluster that uses the section called “IAM access control”, the output of this command

looks like the following JSON example.

{

"BootstrapBrokerStringSaslIam": "b-1.myTestCluster.123z8u.c2.kafka.us-

west-1.amazonaws.com:9098,b-2.myTestCluster.123z8u.c2.kafka.us-

west-1.amazonaws.com:9098"

}

The following example shows the bootstrap brokers for a cluster that has public access

turned on. Use the BootstrapBrokerStringPublicSaslIam for public access, and the

BootstrapBrokerStringSaslIam string for access from within AWS.

{

"BootstrapBrokerStringPublicSaslIam": "b-2-public.myTestCluster.v4ni96.c2.kafka-

beta.us-east-1.amazonaws.com:9198,b-1-public.myTestCluster.v4ni96.c2.kafka-

beta.us-east-1.amazonaws.com:9198,b-3-public.myTestCluster.v4ni96.c2.kafka-beta.us-

east-1.amazonaws.com:9198",

"BootstrapBrokerStringSaslIam": "b-2.myTestCluster.v4ni96.c2.kafka-

beta.us-east-1.amazonaws.com:9098,b-1.myTestCluster.v4ni96.c2.kafka-beta.us-

east-1.amazonaws.com:9098,b-3.myTestCluster.v4ni96.c2.kafka-beta.us-

east-1.amazonaws.com:9098"

}

The bootstrap brokers string should contain three brokers from across the Availability Zones in

which your MSK cluster is deployed (unless only two brokers are available).

Get the bootstrap brokers using the API

To get the bootstrap brokers using the API, see GetBootstrapBrokers.

Get the bootstrap brokers using the AWS CLI 25

Amazon Managed Streaming for Apache Kafka Developer Guide

List Amazon MSK clusters

To get a bootstrap broker for an Amazon MSK cluster, you need the cluster Amazon Resource Name

(ARN). If you don't have the ARN for your cluster, you can ﬁnd it by listing all clusters. See the

section called “Get the bootstrap brokers for an Amazon MSK cluster”.

Topics

• List clusters using the AWS Management Console

• List clusters using the AWS CLI

• List clusters using the API

List clusters using the AWS Management Console

To get a bootstrap broker for an Amazon MSK cluster, you need the cluster Amazon Resource Name

(ARN). If you don't have the ARN for your cluster, you can ﬁnd it by listing all clusters. See the

section called “Get the bootstrap brokers for an Amazon MSK cluster”.

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. The table shows all the clusters for the current region under this account. Choose the name of

a cluster to view its details.

List clusters using the AWS CLI

To get a bootstrap broker for an Amazon MSK cluster, you need the cluster Amazon Resource Name

(ARN). If you don't have the ARN for your cluster, you can ﬁnd it by listing all clusters. See the

section called “Get the bootstrap brokers for an Amazon MSK cluster”.

aws kafka list-clusters

List clusters using the API

To get a bootstrap broker for an Amazon MSK cluster, you need the cluster Amazon Resource Name

(ARN). If you don't have the ARN for your cluster, you can ﬁnd it by listing all clusters. See the

section called “Get the bootstrap brokers for an Amazon MSK cluster”.

To list clusters using the API, see ListClusters.

List clusters 26

Amazon Managed Streaming for Apache Kafka Developer Guide

Metadata management

Amazon MSK supports Apache ZooKeeper or KRaft metadata management modes.

From Apache Kafka version 3.7.x on Amazon MSK, you can create clusters that use KRaft mode

instead of ZooKeeper mode. KRaft-based clusters rely on controllers within Kafka to manage

metadata.

Topics

• ZooKeeper mode

• KRaft mode

ZooKeeper mode

Apache ZooKeeper is "a centralized service for maintaining conﬁguration information, naming,

providing distributed synchronization, and providing group services. All of these kinds of services

are used in some form or another by distributed applications," including Apache Kafka.

If your cluster is using ZooKeeper mode, you can use the steps below to get the Apache ZooKeeper

connection string. However, we recommend that you use the BootstrapServerString to

connect to your cluster and perfom admin operations as the --zookeeper ﬂag has been

deprecated in Kafka 2.5 and is removed from Kafka 3.0.

Getting the Apache ZooKeeper connection string using the AWS Management

Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. The table shows all the clusters for the current region under this account. Choose the name of

a cluster to view its description.

3. On the Cluster summary page, choose View client information. This shows you the bootstrap

brokers, as well as the Apache ZooKeeper connection string.

Getting the Apache ZooKeeper connection string using the AWS CLI

1. If you don't know the Amazon Resource Name (ARN) of your cluster, you can ﬁnd it by listing

all the clusters in your account. For more information, see the section called “List clusters”.

Metadata management 27

Amazon Managed Streaming for Apache Kafka Developer Guide

2. To get the Apache ZooKeeper connection string, along with other information about your

cluster, run the following command, replacing ClusterArn with the ARN of your cluster.

aws kafka describe-cluster --cluster-arn ClusterArn

The output of this describe-cluster command looks like the following JSON example.

{

"ClusterInfo": {

"BrokerNodeGroupInfo": {

"BrokerAZDistribution": "DEFAULT",

"ClientSubnets": [

"subnet-0123456789abcdef0",

"subnet-2468013579abcdef1",

"subnet-1357902468abcdef2"

"InstanceType": "kafka.m5.large",

"StorageInfo": {

"EbsStorageInfo": {

"VolumeSize": 1000

}

"ClusterArn": "arn:aws:kafka:us-east-1:111122223333:cluster/

testcluster/12345678-abcd-4567-2345-abcdef123456-2",

"ClusterName": "testcluster",

"CreationTime": "2018-12-02T17:38:36.75Z",

"CurrentBrokerSoftwareInfo": {

"KafkaVersion": "2.2.1"

"CurrentVersion": "K13V1IB3VIYZZH",

"EncryptionInfo": {

"EncryptionAtRest": {

"DataVolumeKMSKeyId": "arn:aws:kms:us-

east-1:555555555555:key/12345678-abcd-2345-ef01-abcdef123456"

}

"EnhancedMonitoring": "DEFAULT",

"NumberOfBrokerNodes": 3,

"State": "ACTIVE",

"ZookeeperConnectString": "10.0.1.101:2018,10.0.2.101:2018,10.0.3.101:2018"

}

ZooKeeper mode 28

Amazon Managed Streaming for Apache Kafka Developer Guide

}

The previous JSON example shows the ZookeeperConnectString key in the output of the

describe-cluster command. Copy the value corresponding to this key and save it for when

you need to create a topic on your cluster.

Important

Your Amazon MSK cluster must be in the ACTIVE state for you to be able to

obtain the Apache ZooKeeper connection string. When a cluster is still in the

CREATING state, the output of the describe-cluster command doesn't include

ZookeeperConnectString. If this is the case, wait a few minutes and then run the

describe-cluster again after your cluster reaches the ACTIVE state.

Getting the Apache ZooKeeper connection string using the API

To get the Apache ZooKeeper connection string using the API, see DescribeCluster.

KRaft mode

Amazon MSK introduced support for KRaft (Apache Kafka Raft) in Kafka version 3.7.x. The Apache

Kafka community developed KRaft to replace Apache ZooKeeper for metadata management

in Apache Kafka clusters. In KRaft mode, cluster metadata is propagated within a group of

Kafka controllers, which are part of the Kafka cluster, instead of across ZooKeeper nodes.

KRaft controllers are included at no additional cost to you, and require no additional setup or

management from you. See KIP-500 for more information about KRaft.

Here are some points to note about KRaft mode on MSK:

• KRaft mode is only available for new clusters. You cannot switch metadata modes once the

cluster is created.

• On the MSK console, you can create a Kraft-based cluster by choosing Kafka version 3.7.x and

selecting the KRaft checkbox in the cluster creation window.

•

To create a cluster in KRaft mode using the MSK API CreateCluster or CreateClusterV2

operations, you should use 3.7.x.kraft as the version. Use 3.7.x as the version to create a

cluster in ZooKeeper mode.

KRaft mode 29

Amazon Managed Streaming for Apache Kafka Developer Guide

• The number of partitions per broker is the same on KRaft and ZooKeeper based clusters.

However, KRaft allows you to host more partitions per cluster by provisioning more brokers in a

cluster.

• There are no API changes required to use KRaft mode on Amazon MSK. However, if your clients

still use the --zookeeper connection string today, you should update your clients to use the

--bootstrap-server connection string to connect to your cluster. The --zookeeper ﬂag

is deprecated in Apache Kafka version 2.5 and is removed starting with Kafka version 3.0. We

therefore recommend you use recent Apache Kafka client versions and the --bootstrap-

server connection string for all connections to your cluster.

• ZooKeeper mode continues to be available for all released versions where zookeeper is also

supported by Apache Kafka. See Supported Apache Kafka versions for details on the end of

support for Apache Kafka versions and future updates.

• You should check that any tools you use are capable of using Kafka Admin APIs without

ZooKeeper connections. Refer to Use LinkedIn's Cruise Control for Apache Kafka with Amazon

MSK for updated steps to connect your cluster to Cruise Control. Cruise Control also has

instructions for running Cruise Control without ZooKeeper.

• You do not need to access your cluster's KRaft controllers directly for any administrative actions.

However, if you are using open monitoring to collect metrics, you also need the DNS endpoints

of your controllers in order to collect some non-controller related metrics about your cluster.

You can get these DNS endpoints from the MSK Console or using the ListNodes API operation.

See Monitor MSK cluster with Prometheus for updated steps for setting up open-monitoring for

KRaft-based clusters.

• There are no additional CloudWatch metrics you need to monitor for KRaft mode clusters over

ZooKeeper mode clusters. MSK manages the KRaft controllers used in your clusters.

•

You can continue managing ACLs using in KRaft mode clusters using the --bootstrap-server

connection string. You should not use the --zookeeper connection string to manage ACLs. See

Apache Kafka ACLs.

• In KRaft mode, your cluster’s metadata is stored on KRaft controllers within Kafka and not

external ZooKeeper nodes. Therefore, you don't need to control access to controller nodes

separately as you do with ZooKeeper nodes.

Storage management for Amazon MSK clusters

Amazon MSK provides features to help you with storage management on your MSK clusters.

Storage management for Amazon MSK clusters 30

Amazon Managed Streaming for Apache Kafka Developer Guide

Topics

• Tiered storage for Amazon MSK clusters

• Scale up Amazon MSK broker storage

• Provision storage throughput for brokers in a Amazon MSK cluster

Tiered storage for Amazon MSK clusters

Tiered storage is a low-cost storage tier for Amazon MSK that scales to virtually unlimited storage,

making it cost-eﬀective to build streaming data applications.

You can create an Amazon MSK cluster conﬁgured with tiered storage that balances performance

and cost. Amazon MSK stores streaming data in a performance-optimized primary storage tier until

it reaches the Apache Kafka topic retention limits. Then, Amazon MSK automatically moves data

into the new low-cost storage tier.

When your application starts reading data from the tiered storage, you can expect an increase in

read latency for the ﬁrst few bytes. As you start reading the remaining data sequentially from the

low-cost tier, you can expect latencies that are similar to the primary storage tier. You don't need

to provision any storage for the low-cost tiered storage or manage the infrastructure. You can

store any amount of data and pay only for what you use. This feature is compatible with the APIs

introduced in KIP-405: Kafka Tiered Storage.

Here are some of the features of tiered storage:

• You can scale to virtually unlimited storage. You don't have to guess how to scale your Apache

Kafka infrastructure.

• You can retain data longer in your Apache Kafka topics, or increase your topic storage, without

the need to increase the number of brokers.

• It provides a longer duration safety buﬀer to handle unexpected delays in processing.

• You can reprocess old data in its exact production order with your existing stream processing

code and Kafka APIs.

• Partitions rebalance faster because data on secondary storage doesn't require replication across

broker disks.

• Data between brokers and the tiered storage moves within the VPC and doesn't travel through

the internet.

Tiered storage for Amazon MSK clusters 31

Amazon Managed Streaming for Apache Kafka Developer Guide

• A client machine can use the same process to connect to new clusters with tiered storage

enabled as it does to connect to a cluster without tiered storage enabled. See Create a client

machine.

Tiered storage requirements for Amazon MSK clusters

• You must use Apache Kafka client version 3.0.0 or higher to create a new topic with tiered

storage enabled. To transition an existing topic to tiered storage, you can reconﬁgure a client

machine that uses a Kafka client version lower than 3.0.0 (minimum supported Apache Kafka

version is 2.8.2.tiered) to enable tiered storage. See Step 4: Create a topic in the Amazon MSK

cluster.

• The Amazon MSK cluster with tiered storage enabled must use version 3.6.0 or higher, or

2.8.2.tiered.

Tiered storage constraints and limitations for Amazon MSK clusters

Tiered storage has the following constraints and limitations:

•

Make sure clients are not conﬁgured to read_committed when reading from the remote_tier in

Amazon MSK, unless the application is actively using the transactions feature.

• Tiered storage isn't available in AWS GovCloud (US) regions.

• Tiered storage applies only to provisioned mode clusters.

• Tiered storage doesn’t support broker size t3.small.

• The minimum retention period in low-cost storage is 3 days. There is no minimum retention

period for primary storage.

• Tiered storage doesn’t support Multiple Log directories on a broker (JBOD related features).

• Tiered storage does not support compacted topics. Ensure that all topics that have tiered storage

turned on have their cleanup.policy conﬁgured to 'DELETE' only.

• Tiered Storage can be disabled for individual topics but not for the entire cluster. Once disabled,

tiered storage cannot be re-enabled for a topic.

• If you use Amazon MSK version 2.8.2.tiered, you can migrate only to another tiered storage-

supported Apache Kafka version. If you don't want to continue using a tiered storage-supported

version, create a new MSK cluster and migrate your data to it.

• The kafka-log-dirs tool can't report tiered storage data size. The tool only reports the size of the

log segments in primary storage.

Tiered storage for Amazon MSK clusters 32

Amazon Managed Streaming for Apache Kafka Developer Guide

How log segments are copied to tiered storage for a Amazon MSK topic

When you enable tiered storage for a new or existing topic, Apache Kafka copies closed log

segments from primary storage to tiered storage.

• Apache Kafka only copies closed log segments. It copies all messages within the log segment to

tiered storage.

• Active segments are not eligible for tiering. The log segment size (segment.bytes) or the

segment roll time (segment.ms) controls the rate of segment closure, and the rate Apache Kafka

then copies them to tiered storage.

Retention settings for a topic with tiered storage enabled are diﬀerent from settings for a topic

without tiered storage enabled. The following rules control the retention of messages in topics

with tiered storage enabled:

• You deﬁne retention in Apache Kafka with two settings: log.retention.ms (time) and

log.retention.bytes (size). These settings determine the total duration and size of the data that

Apache Kafka retains in the cluster. Whether or not you enable tiered storage mode, you set

these conﬁgurations at the cluster level. You can override the settings at the topic level with

topic conﬁgurations.

• When you enable tiered storage, you can additionally specify how long the primary

high-performance storage tier stores data. For example, if a topic has overall retention

(log.retention.ms) setting of 7 days and local retention (local.retention.ms) of 12 hours, then the

cluster primary storage retains data for only the ﬁrst 12 hours. The low-cost storage tier retains

the data for the full 7 days.

• The usual retention settings apply to the full log. This includes its tiered and primary parts.

• The local.retention.ms or local.retention.bytes settings control the retention of messages

in primary storage. When data has reached primary storage retention setting thresholds

(local.retention.ms/bytes) on a full log, Apache Kafka copies the data in primary storage to tiered

storage. The data is then eligible for expiration.

• When Apache Kafka copies a message in a log segment to tiered storage, it removes the message

from the cluster based on retention.ms or retention.bytes settings.

Tiered storage for Amazon MSK clusters 33

Amazon Managed Streaming for Apache Kafka Developer Guide

Example Amazon MSK tiered storage scenario

This scenario illustrates how an existing topic that has messages in primary storage

behaves when tiered storage is enabled. You enable tiered storage on this topic by when

you set remote.storage.enable to true. In this example, retention.ms is set to 5 days and

local.retention.ms is set to 2 days. The following is the sequence of events when a segment expires.

Time T0 - Before you enable tiered storage.

Before you enable tiered storage for this topic, there are two log segments. One of the segments is

active for an existing topic partition 0.

Time T1 (< 2 days) - Tiered storage enabled. Segment 0 copied to tiered storage.

After you enable tiered storage for this topic, Apache Kafka copies log segment 0 to tiered storage

after the segment meets initial retention settings. Apache Kafka also retains the primary storage

copy of segment 0. The active segment 1 is not eligible to copy over to tiered storage yet. In this

timeline, Amazon MSK doesn't apply any of the retention settings yet for any of the messages in

segment 0 and segment 1. (local.retention.bytes/ms, retention.ms/bytes)

Tiered storage for Amazon MSK clusters 34

Amazon Managed Streaming for Apache Kafka Developer Guide

Time T2 - Local retention in eﬀect.

After 2 days, primary retention settings take eﬀect for the segment 0 that Apache Kafka copied

to the tiered storage. The setting of local.retention.ms as 2 days determines this. Segment 0 now

expires from the primary storage. Active segment 1 is neither eligible for expiration nor eligible to

copy over to tiered storage yet.

Time T3 - Overall retention in eﬀect.

After 5 days, retention settings take eﬀect, and Kafka clears log segment 0 and associated

messages from tiered storage. Segment 1 is neither eligible for expiration nor eligible to copy over

to tiered storage yet because it is active. Segment 1 is not yet closed, so it is ineligible for segment

roll.

Tiered storage for Amazon MSK clusters 35

Amazon Managed Streaming for Apache Kafka Developer Guide

Create a Amazon MSK cluster with tiered storage with the AWS Management

Console

This process describes how to create a tiered storage Amazon MSK cluster using the AWS

Management Console.

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose Create cluster.

3. Choose Custom create for tiered storage.

4. Specify a name for the cluster.

5. In the Cluster type, select Provisioned.

6. Choose an Amazon Kafka version that supports tiered storage for Amazon MSK to use to

create the cluster.

7. Specify a size of broker other than kafka.t3.small.

8. Select the number of brokers that you want Amazon MSK to create in each Availability Zone.

The minimum is one broker per Availability Zone, and the maximum is 30 brokers per cluster.

9. Specify the number of zones that brokers are distributed across.

10. Specify the number of Apache Kafka brokers that are deployed per zone.

11. Select Storage options. This includes Tiered storage and EBS storage to enable tiered storage

mode.

12. Follow the remaining steps in the cluster creation wizard. When complete, Tiered storage and

EBS storage appears as the cluster storage mode in the Review and create view.

Tiered storage for Amazon MSK clusters 36

Amazon Managed Streaming for Apache Kafka Developer Guide

13. Select Create cluster.

Create an Amazon MSK cluster with tiered storage with the AWS CLI

To enable tiered storage on a cluster, create the cluster with the correct Apache Kafka version and

attribute for tiered storage. Follow the code example below. Also, complete the steps in the next

section to Create a Kafka topic with tiered storage enabled.

See create-cluster for a complete list of supported attributes for cluster creation.

aws tiered-storage create-cluster \

—cluster-name "MessagingCluster" \

—broker-node-group-info file://brokernodegroupinfo.json \

—number-of-broker-nodes 3 \

--kafka-version "3.6.0" \

--storage-mode "TIERED"

Create a Kafka topic with tiered storage enabled

To complete the process that you started when you created a cluster with the tiered storage

enabled, also create a topic with tiered storage enabled with the attributes in the later code

example. The attributes speciﬁcally for tiered storage are the following:

•

local.retention.ms (for example, 10 mins) for time-based retention settings or

local.retention.bytes for log segment size limits.

•

remote.storage.enable set to true to enable tiered storage.

The following conﬁguration uses local.retention.ms, but you can replace this attribute with

local.retention.bytes. This attribute controls the amount of time that can pass or number of bytes

that Apache Kafka can copy before Apache Kafka copies the data from primary to tiered storage.

See Topic-level conﬁguration for more details on supported conﬁguration attributes.

Note

You must use the Apache Kafka client version 3.0.0 and above. These versions support

a setting called remote.storage.enable only in those client versions of kafka-

topics.sh. To enable tiered storage on an existing topic that uses an earlier version of

Apache Kafka, see the section Enabling tiered storage on an existing Amazon MSK topic.

Tiered storage for Amazon MSK clusters 37

Amazon Managed Streaming for Apache Kafka Developer Guide

bin/kafka-topics.sh --create --bootstrap-server $bs --replication-factor 2

--partitions 6 --topic MSKTutorialTopic --config remote.storage.enable=true

--config local.retention.ms=100000 --config retention.ms=604800000 --config

segment.bytes=134217728

Enable and disable tiered storage on an existing Amazon MSK topic

These sections cover how to enable and disable tiered storage on a topic that you've already

created. To create a new cluster and topic with tiered storage enabled, see Creating a cluster with

tiered storage using the AWS Management Console.

Enabling tiered storage on an existing Amazon MSK topic

To enable tiered storage on an existing topic, use the alter command syntax in the following

example. When you enable tiered storage on an already existing topic, you aren't restricted to a

certain Apache Kafka client version.

bin/kafka-configs.sh --bootstrap-server $bsrv --alter --entity-type topics

--entity-name msk-ts-topic --add-config 'remote.storage.enable=true,

local.retention.ms=604800000, retention.ms=15550000000'

Disable tiered storage on an existing Amazon MSK topic

To disable tiered storage on an existing topic, use the alter command syntax in the same order as

when you enable tiered storage.

bin/kafka-configs.sh --bootstrap-server $bs --alter --entity-type topics --

entity-name MSKTutorialTopic --add-config 'remote.log.msk.disable.policy=Delete,

remote.storage.enable=false'

Note

When you disable tiered storage, you completely delete the topic data in tiered storage.

Apache Kafka retains primary storage data , but it still applies the primary retention rules

based on local.retention.ms. After you disable tiered storage on a topic, you can't re-

enable it. If you want to disable tiered storage on an existing topic, you aren't restricted to

a certain Apache Kafka client version.

Tiered storage for Amazon MSK clusters 38

Amazon Managed Streaming for Apache Kafka Developer Guide

Enable tiered storage on an existing Amazon MSK cluster using AWS CLI

Note

You can enable tiered storage only if your cluster's log.cleanup.policy is set to delete,

as compacted topics are not supported on tiered storage. Later, you can conﬁgure an

individual topic's log.cleanup.policy to compact if tiered storage is not enabled on that

particular topic. See Topic-level conﬁguration for more details on supported conﬁguration

attributes.

1. Update the Kafka version – Cluster versions aren't simple integers. To ﬁnd the current version

of the cluster, use the DescribeCluster operation or the describe-cluster AWS CLI

command. An example version is KTVPDKIKX0DER.

aws kafka update-cluster-kafka-version --cluster-arn ClusterArn --current-version

Current-Cluster-Version --target-kafka-version 3.6.0

2. Edit cluster storage mode. The following code example shows editing the cluster storage mode

to TIERED using the update-storage API.

aws kafka update-storage --current-version Current-Cluster-Version --cluster-arn

Cluster-arn --storage-mode TIERED

Update tiered storage on an existing Amazon MSK cluster using the console

Note

You can enable tiered storage only if your cluster's log.cleanup.policy is set to delete,

as compacted topics are not supported on tiered storage. Later, you can conﬁgure an

individual topic's log.cleanup.policy to compact if tiered storage is not enabled on that

particular topic. See Topic-level conﬁguration for more details on supported conﬁguration

attributes.

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Go to the cluster summary page and choose Properties.

Tiered storage for Amazon MSK clusters 39

Amazon Managed Streaming for Apache Kafka Developer Guide

3. Go to the Storage section and choose Edit cluster storage mode.

4. Choose Tiered storage and EBS storage and Save changes.

Scale up Amazon MSK broker storage

You can increase the amount of EBS storage per broker. You can't decrease the storage.

Storage volumes remain available during this scaling-up operation.

Important

When storage is scaled for an MSK cluster, the additional storage is made available right

away. However, the cluster requires a cool-down period after every storage scaling event.

Amazon MSK uses this cool-down period to optimize the cluster before it can be scaled

again. This period can range from a minimum of 6 hours to over 24 hours, depending

on the cluster's storage size and utilization and on traﬃc. This is applicable for both

auto scaling events and manual scaling using the UpdateBrokerStorage operation. For

information about right-sizing your storage, see Best practices.

You can use tiered storage to scale up to unlimited amounts of storage for your broker. See, Tiered

storage for Amazon MSK clusters.

Topics

• Automatic scaling for Amazon MSK clusters

• Manual scaling

Automatic scaling for Amazon MSK clusters

To automatically expand your cluster's storage in response to increased usage, you can conﬁgure

an Application Auto-Scaling policy for Amazon MSK. In an auto-scaling policy, you set the target

disk utilization and the maximum scaling capacity.

Before you use automatic scaling for Amazon MSK, you should consider the following:

Scale up Amazon MSK broker storage 40

Amazon Managed Streaming for Apache Kafka Developer Guide

•

Important

A storage scaling action can occur only once every six hours.

We recommend that you start with a right-sized storage volume for your storage demands. For

guidance on right-sizing your cluster, see Right-size your cluster: Number of brokers per cluster.

• Amazon MSK does not reduce cluster storage in response to reduced usage. Amazon MSK does

not support decreasing the size of storage volumes. If you need to reduce the size of your cluster

storage, you must migrate your existing cluster to a cluster with smaller storage. For information

about migrating a cluster, see Migrate to Amazon MSK Cluster.

• Amazon MSK does not support automatic scaling in the Asia Paciﬁc (Osaka) and Africa (Cape

Town) Regions.

• When you associate an auto-scaling policy with your cluster, Amazon EC2 Auto Scaling

automatically creates an Amazon CloudWatch alarm for target tracking. If you delete a cluster

with an auto-scaling policy, this CloudWatch alarm persists. To delete the CloudWatch alarm, you

should remove an auto-scaling policy from a cluster before you delete the cluster. To learn more

about target tracking, see Target tracking scaling policies for Amazon EC2 Auto Scaling in the

Amazon EC2 Auto Scaling User Guide.

Topics

• Auto-scaling policy details for Amazon MSK

• Set up automatic scaling for your Amazon MSK cluster

Auto-scaling policy details for Amazon MSK

An auto-scaling policy deﬁnes the following parameters for your cluster:

• Storage Utilization Target: The storage utilization threshold that Amazon MSK uses to trigger

an auto-scaling operation. You can set the utilization target between 10% and 80% of the

current storage capacity. We recommend that you set the Storage Utilization Target between

50% and 60%.

• Maximum Storage Capacity: The maximum scaling limit that Amazon MSK can set for your

broker storage. You can set the maximum storage capacity up to 16 TiB per broker. For more

information, see Amazon MSK quota.

Scale up Amazon MSK broker storage 41

Amazon Managed Streaming for Apache Kafka Developer Guide

When Amazon MSK detects that your Maximum Disk Utilization metric is equal to or greater

than the Storage Utilization Target setting, it increases your storage capacity by an

amount equal to the larger of two numbers: 10 GiB or 10% of current storage. For example, if you

have 1000 GiB, that amount is 100 GiB. The service checks your storage utilization every minute.

Further scaling operations continue to increase storage by an amount equal to the larger of two

numbers: 10 GiB or 10% of current storage.

To determine if auto-scaling operations have occurred, use the ListClusterOperations operation.

Set up automatic scaling for your Amazon MSK cluster

You can use the Amazon MSK console, the Amazon MSK API, or AWS CloudFormation to implement

automatic scaling for storage. CloudFormation support is available through Application Auto

Scaling.

Note

You can't implement automatic scaling when you create a cluster. You must ﬁrst create the

cluster, and then create and enable an auto-scaling policy for it. However, you can create

the policy while Amazon MSK service creates your cluster.

Topics

• Set up automatic scaling using the Amazon MSK AWS Management Console

• Set up automatic scaling for Amazon MSK using the CLI

• Set up automatic-scaling for Amazon MSK using the API

Set up automatic scaling using the Amazon MSK AWS Management Console

This process describes how to use the Amazon MSK console to implement automatic scaling for

storage.

1. Sign in to the AWS Management Console, and open the Amazon MSK console at https://

console.aws.amazon.com/msk/home?region=us-east-1#/home/.

2. In the list of clusters, choose your cluster. This takes you to a page that lists details about the

cluster.

3. In the Auto scaling for storage section, choose Conﬁgure.

Scale up Amazon MSK broker storage 42

Amazon Managed Streaming for Apache Kafka Developer Guide

4. Create and name an auto-scaling policy. Specify the storage utilization target, the maximum

storage capacity, and the target metric.

Choose Save changes.

When you save and enable the new policy, the policy becomes active for the cluster. Amazon MSK

then expands the cluster's storage when the storage utilization target is reached.

Set up automatic scaling for Amazon MSK using the CLI

This process describes how to use the Amazon MSK CLI to implement automatic scaling for

storage.

1. Use the RegisterScalableTarget command to register a storage utilization target.

2. Use the PutScalingPolicy command to create an auto-expansion policy.

Set up automatic-scaling for Amazon MSK using the API

This process describes how to use the Amazon MSK API to implement automatic scaling for

storage.

1. Use the RegisterScalableTarget API to register a storage utilization target.

2. Use the PutScalingPolicy API to create an auto-expansion policy.

Manual scaling

To increase storage, wait for the cluster to be in the ACTIVE state. Storage scaling has a cool-down

period of at least six hours between events. Even though the operation makes additional storage

available right away, the service performs optimizations on your cluster that can take up to 24

hours or more. The duration of these optimizations is proportional to your storage size.

Scaling up broker storage using the AWS Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose the MSK cluster for which you want to update broker storage.

3. In the Storage section, choose Edit.

4. Specify the storage volume you want. You can only increase the amount of storage, you can't

decrease it.

Scale up Amazon MSK broker storage 43

Amazon Managed Streaming for Apache Kafka Developer Guide

5. Choose Save changes.

Scaling up broker storage using the AWS CLI

Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that

you obtained when you created your cluster. If you don't have the ARN for your cluster, you can

ﬁnd it by listing all clusters. For more information, see the section called “List clusters”.

Replace Current-Cluster-Version with the current version of the cluster.

Important

Cluster versions aren't simple integers. To ﬁnd the current version of the cluster, use the

DescribeCluster operation or the describe-cluster AWS CLI command. An example version is

KTVPDKIKX0DER.

The Target-Volume-in-GiB parameter represents the amount of storage that you want each

broker to have. It is only possible to update the storage for all the brokers. You can't specify

individual brokers for which to update storage. The value you specify for Target-Volume-in-

GiB must be a whole number that is greater than 100 GiB. The storage per broker after the update

operation can't exceed 16384 GiB.

aws kafka update-broker-storage --cluster-arn ClusterArn --current-version Current-

Cluster-Version --target-broker-ebs-volume-info '{"KafkaBrokerNodeId": "All",

"VolumeSizeGB": Target-Volume-in-GiB}'

Scaling up broker storage using the API

To update a broker storage using the API, see UpdateBrokerStorage.

Provision storage throughput for brokers in a Amazon MSK cluster

Amazon MSK brokers persist data on storage volumes. Storage I/O is consumed when producers

write to the cluster, when data is replicated between brokers, and when consumers read data that

isn't in memory. The volume storage throughput is the rate at which data can be written into and

read from a storage volume. Provisioned storage throughput is the ability to specify that rate for

the brokers in your cluster.

Provision storage throughput for brokers in a Amazon MSK cluster 44

Amazon Managed Streaming for Apache Kafka Developer Guide

You can specify the provisioned throughput rate in MiB per second for clusters whose brokers

are of size kafka.m5.4xlarge or larger and if the storage volume is 10 GiB or greater. It is

possible to specify provisioned throughput during cluster creation. You can also enable or disable

provisioned throughput for a cluster that is in the ACTIVE state.

Amazon MSK broker throughput bottlenecks and maximum throughput settings

There are multiple causes of bottlenecks in broker throughput: volume throughput, Amazon EC2 to

Amazon EBS network throughput, and Amazon EC2 egress throughput. You can enable provisioned

storage throughput to adjust volume throughput. However, broker throughput limitations can be

caused by Amazon EC2 to Amazon EBS network throughput and Amazon EC2 egress throughput.

Amazon EC2 egress throughput is impacted by the number of consumer groups and consumers per

consumer groups. Also, both Amazon EC2 to Amazon EBS network throughput and Amazon EC2

egress throughput are higher for larger broker sizes.

For volume sizes of 10 GiB or larger, you can provision storage throughput of 250 MiB per second

or greater. 250 MiB per second is the default. To provision storage throughput, you must choose

broker size kafka.m5.4xlarge or larger (or kafka.m7g.2xlarge or larger), and you can specify

maximum throughput as shown in the following table.

broker size Maximum storage throughput (MiB/second)

kafka.m5.4xlarge 593

kafka.m5.8xlarge 850

kafka.m5.12xlarge 1000

kafka.m5.16xlarge 1000

kafka.m5.24xlarge 1000

kafka.m7g.2xlarge 312.5

kafka.m7g.4xlarge 625

kafka.m7g.8xlarge 1000

kafka.m7g.12xlarge 1000

Provision storage throughput for brokers in a Amazon MSK cluster 45

Amazon Managed Streaming for Apache Kafka Developer Guide

broker size Maximum storage throughput (MiB/second)

kafka.m7g.16xlarge 1000

Measure storage throughput of a Amazon MSK cluster

You can use the VolumeReadBytes and VolumeWriteBytes metrics to measure the average

storage throughput of a cluster. The sum of these two metrics gives the average storage

throughput in bytes. To get the average storage throughput for a cluster, set these two metrics to

SUM and the period to 1 minute, then use the following formula.

Average storage throughput in MiB/s = (Sum(VolumeReadBytes) + Sum(VolumeWriteBytes)) /

(60 * 1024 * 1024)

For information about the VolumeReadBytes and VolumeWriteBytes metrics, see the section

called “PER_BROKER Level monitoring”.

Conﬁguration update values for provisioned storage in a Amazon MSK cluster

You can update your Amazon MSK conﬁguration either before or after you turn on provisioned

throughput. However, you won't see the desired throughput until you perform both actions: update

the num.replica.fetchers conﬁguration parameter and turn on provisioned throughput.

In the default Amazon MSK conﬁguration, num.replica.fetchers has a value of 2. To update

your num.replica.fetchers, you can use the suggested values from the following table. These

values are for guidance purposes. We recommend that you adjust these values based on your use

case.

broker size num.replica.fetchers

kafka.m5.4xlarge 4

kafka.m5.8xlarge 8

kafka.m5.12xlarge 14

kafka.m5.16xlarge 16

kafka.m5.24xlarge 16

Provision storage throughput for brokers in a Amazon MSK cluster 46

Amazon Managed Streaming for Apache Kafka Developer Guide

Your updated conﬁguration may not take eﬀect for up to 24 hours, and may take longer when a

source volume is not fully utilized. However, transitional volume performance at least equals the

performance of source storage volumes during the migration period. A fully-utilized 1 TiB volume

typically takes about six hours to migrate to an updated conﬁguration.

Provision Amazon MSK cluster storage throughput using the AWS Management

Console

This process shows an example of how you can use the AWS Management Console to create a

Amazon MSK cluster with provisioned throughput enabled.

1. Sign in to the AWS Management Console, and open the Amazon MSK console at https://

console.aws.amazon.com/msk/home?region=us-east-1#/home/.

2. Choose Create cluster.

3. Choose Custom create.

4. Specify a name for the cluster.

5. In the Storage section, choose Enable.

6. Choose a value for storage throughput per broker.

7. Choose a VPC, zones and subnets, and a security group.

8. Choose Next.

9. At the bottom of the Security step, choose Next.

10. At the bottom of the Monitoring and tags step, choose Next.

11. Review the cluster settings, then choose Create cluster.

Provision Amazon MSK cluster storage throughput using the AWS CLI

This process shows an example of how you can use the AWS CLI to create a cluster with provisioned

throughput enabled.

1. Copy the following JSON and paste it into a ﬁle. Replace the subnet IDs and security group ID

placeholders with values from your account. Name the ﬁle cluster-creation.json and

save it.

{

"Provisioned": {

"BrokerNodeGroupInfo":{

Provision storage throughput for brokers in a Amazon MSK cluster 47

Amazon Managed Streaming for Apache Kafka Developer Guide

"InstanceType":"kafka.m5.4xlarge",

"ClientSubnets":[

"Subnet-1-ID",

"Subnet-2-ID"

"SecurityGroups":[

"Security-Group-ID"

"StorageInfo": {

"EbsStorageInfo": {

"VolumeSize": 10,

"ProvisionedThroughput": {

"Enabled": true,

"VolumeThroughput": 250

}

"EncryptionInfo": {

"EncryptionInTransit": {

"InCluster": false,

"ClientBroker": "PLAINTEXT"

}

"KafkaVersion":"2.8.1",

"NumberOfBrokerNodes": 2

"ClusterName": "provisioned-throughput-example"

}

2. Run the following AWS CLI command from the directory where you saved the JSON ﬁle in the

previous step.

aws kafka create-cluster-v2 --cli-input-json file://cluster-creation.json

Provision storage throughput when creating a Amazon MSK cluster using the API

To conﬁgure provisioned storage throughput while creating a cluster, use CreateClusterV2.

Provision storage throughput for brokers in a Amazon MSK cluster 48

Amazon Managed Streaming for Apache Kafka Developer Guide

Update the Amazon MSK cluster broker size

You can scale your MSK cluster on demand by changing the size of your brokers without

reassigning Apache Kafka partitions. Changing the size of your brokers gives you the ﬂexibility

to adjust your MSK cluster's compute capacity based on changes in your workloads, without

interrupting your cluster I/O. Amazon MSK uses the same broker size for all the brokers in a given

cluster.

This section describes how to update the broker size for your MSK cluster. You can update your

cluster broker size from M5 or T3 to M7g, or from M7g to M5. Be aware that migrating to a smaller

broker size can decrease performance and reduce maxiumum achievable throughput per broker.

Migrating to a larger broker size can increase performance but may cost more.

The broker-size update happens in a rolling fashion while the cluster is up and running. This

means that Amazon MSK takes down one broker at a time to perform the broker-size update.

For information about how to make a cluster highly available during a broker-size update, see

the section called “Build highly available clusters”. To further reduce any potential impact on

productivity, you can perform the broker-size update during a period of low traﬃc.

During a broker-size update, you can continue to produce and consume data. However, you must

wait until the update is done before you can reboot brokers or invoke any of the update operations

listed under Amazon MSK operations.

If you want to update your cluster to a smaller broker size, we recommend that you try the update

on a test cluster ﬁrst to see how it aﬀects your scenario.

Important

You can't update a cluster to a smaller broker size if the number of partitions per broker

exceeds the maximum number speciﬁed in the section called “ Right-size your cluster:

Number of partitions per broker”.

Update the Amazon MSK cluster broker size using the AWS

Management Console

This process shows how to update the Amazon MSK cluster broker size using the AWS Management

Console

Update the Amazon MSK cluster broker size 49

Amazon Managed Streaming for Apache Kafka Developer Guide

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose the MSK cluster for which you want to update the broker size.

3. On the details page for the cluster, ﬁnd the Brokers summary section, and choose Edit broker

size.

4. Choose the broker size you want from the list.

5. Save changes.

Update the Amazon MSK cluster broker size using the AWS CLI

Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN) that

you obtained when you created your cluster. If you don't have the ARN for your cluster, you can

ﬁnd it by listing all clusters. For more information, see the section called “List clusters”.

Replace Current-Cluster-Version with the current version of the cluster and

TargetType with the new size that you want the brokers to be. To learn more about broker

sizes, see the section called “Amazon MSK broker sizes”.

aws kafka update-broker-type --cluster-arn ClusterArn --current-version Current-

Cluster-Version --target-instance-type TargetType

The following is an example of how to use this command:

aws kafka update-broker-type --cluster-arn "arn:aws:kafka:us-

east-1:0123456789012:cluster/exampleName/abcd1234-0123-abcd-5678-1234abcd-1" --

current-version "K1X5R6FKA87" --target-instance-type kafka.m5.large

The output of this command looks like the following JSON example.

{

"ClusterArn": "arn:aws:kafka:us-east-1:0123456789012:cluster/exampleName/

abcd1234-0123-abcd-5678-1234abcd-1",

"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef"

}

Update the Amazon MSK cluster broker size using the AWS CLI 50

Amazon Managed Streaming for Apache Kafka Developer Guide

To get the result of the update-broker-type operation, run the following command,

replacing ClusterOperationArn with the ARN that you obtained in the output of the

update-broker-type command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON

example.

{

"ClusterOperationInfo": {

"ClientRequestId": "982168a3-939f-11e9-8a62-538df00285db",

"ClusterArn": "arn:aws:kafka:us-east-1:0123456789012:cluster/exampleName/

abcd1234-0123-abcd-5678-1234abcd-1",

"CreationTime": "2021-01-09T02:24:22.198000+00:00",

"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-operation/

exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef",

"OperationState": "UPDATE_COMPLETE",

"OperationType": "UPDATE_BROKER_TYPE",

"SourceClusterInfo": {

"InstanceType": "t3.small"

"TargetClusterInfo": {

"InstanceType": "m5.large"

}

If OperationState has the value UPDATE_IN_PROGRESS, wait a while, then run the

describe-cluster-operation command again.

Updating the broker size using the API

To update the broker size using the API, see UpdateBrokerType.

You can use UpdateBrokerType to update your cluster broker size from M5 or T3 to M7g, or from

M7g to M5.

Updating the broker size using the API 51

Amazon Managed Streaming for Apache Kafka Developer Guide

Update the conﬁguration of an Amazon MSK cluster

To update the conﬁguration of a cluster, make sure that the cluster is in the ACTIVE state. You

must also ensure that the number of partitions per broker on your MSK cluster is under the limits

described in the section called “ Right-size your cluster: Number of partitions per broker”. You can't

update the conﬁguration of a cluster that exceeds these limits.

For information about MSK conﬁguration, including how to create a custom conﬁguration, which

properties you can update, and what happens when you update the conﬁguration of an existing

cluster, see Amazon MSK conﬁguration.

Updating the conﬁguration of a cluster using the AWS CLI

Copy the following JSON and save it to a ﬁle. Name the ﬁle configuration-info.json.

Replace ConfigurationArn with the Amazon Resource Name (ARN) of the conﬁguration that

you want to use to update the cluster. The ARN string must be in quotes in the following JSON.

Replace Configuration-Revision with the revision of the conﬁguration that you want to

use. Conﬁguration revisions are integers (whole numbers) that start at 1. This integer mustn't

be in quotes in the following JSON.

{

"Arn": ConfigurationArn,

"Revision": Configuration-Revision

}

Run the following command, replacing ClusterArn with the ARN that you obtained when

you created your cluster. If you don't have the ARN for your cluster, you can ﬁnd it by listing all

clusters. For more information, see the section called “List clusters”.

Replace Path-to-Config-Info-File with the path to your conﬁguration info ﬁle. If you

named the ﬁle that you created in the previous step configuration-info.json and

saved it in the current directory, then Path-to-Config-Info-File is configuration-

info.json.

Replace Current-Cluster-Version with the current version of the cluster.

Update the conﬁguration of a cluster 52

Amazon Managed Streaming for Apache Kafka Developer Guide

Important

Cluster versions aren't simple integers. To ﬁnd the current version of the cluster, use

the DescribeCluster operation or the describe-cluster AWS CLI command. An example

version is KTVPDKIKX0DER.

aws kafka update-cluster-configuration --cluster-arn ClusterArn --configuration-

info file://Path-to-Config-Info-File --current-version Current-Cluster-Version

The following is an example of how to use this command:

aws kafka update-cluster-configuration --cluster-arn "arn:aws:kafka:us-

east-1:0123456789012:cluster/exampleName/abcd1234-0123-abcd-5678-1234abcd-1" --

configuration-info file://c:\users\tester\msk\configuration-info.json --current-

version "K1X5R6FKA87"

The output of this update-cluster-configuration command looks like the following

JSON example.

{

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/

abcdefab-1234-abcd-5678-cdef0123ab01-2",

"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef"

}

To get the result of the update-cluster-configuration operation, run the following

command, replacing ClusterOperationArn with the ARN that you obtained in the output of

the update-cluster-configuration command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON

example.

{

Updating the conﬁguration of a cluster using the AWS CLI 53

Amazon Managed Streaming for Apache Kafka Developer Guide

"ClusterOperationInfo": {

"ClientRequestId": "982168a3-939f-11e9-8a62-538df00285db",

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/

exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2",

"CreationTime": "2019-06-20T21:08:57.735Z",

"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef",

"OperationState": "UPDATE_COMPLETE",

"OperationType": "UPDATE_CLUSTER_CONFIGURATION",

"SourceClusterInfo": {},

"TargetClusterInfo": {

"ConfigurationInfo": {

"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/

ExampleConfigurationName/abcdabcd-abcd-1234-abcd-abcd123e8e8e-1",

"Revision": 1

}

In this output, OperationType is UPDATE_CLUSTER_CONFIGURATION. If OperationState

has the value UPDATE_IN_PROGRESS, wait a while, then run the describe-cluster-

operation command again.

Update the conﬁguration of a Amazon MSK cluster using the API

To use the API to update the conﬁguration of a Amazon MSK cluster, see

UpdateClusterConﬁguration.

Expand the number of brokers in an Amazon MSK cluster

Use this Amazon MSK operation when you want to increase the number of brokers in your MSK

cluster. To expand a cluster, make sure that it is in the ACTIVE state.

Important

If you want to expand an MSK cluster, make sure to use this Amazon MSK operation. Don't

try to add brokers to a cluster without using this operation.

Update the conﬁguration of a Amazon MSK cluster using the API 54

Amazon Managed Streaming for Apache Kafka Developer Guide

For information about how to rebalance partitions after you add brokers to a cluster, see the

section called “Reassign partitions”.

Expand a Amazon MSK cluster using the AWS Management Console

This process describes how to increase the number of brokers in an Amazon MSK cluster using the

AWS Management Console.

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose the MSK cluster whose number of brokers you want to increase.

3. On the cluster details page, choose the Edit button next to the Cluster-Level Broker Details

heading.

4. Enter the number of brokers that you want the cluster to have per Availability Zone and then

choose Save changes.

Expand a Amazon MSK cluster using the AWS CLI

This process describes how to increase the number of brokers in an Amazon MSK cluster using the

AWS CLI.

Run the following command, replacing ClusterArn with the Amazon Resource Name

(ARN) that you obtained when you created your cluster. If you don't have the ARN for your

cluster, you can ﬁnd it by listing all clusters. For more information, see the section called “List

clusters”.

Replace Current-Cluster-Version with the current version of the cluster.

Important

Cluster versions aren't simple integers. To ﬁnd the current version of the cluster, use

the DescribeCluster operation or the describe-cluster AWS CLI command. An example

version is KTVPDKIKX0DER.

The Target-Number-of-Brokers parameter represents the total number of broker nodes

that you want the cluster to have when this operation completes successfully. The value you

specify for Target-Number-of-Brokers must be a whole number that is greater than

Expand a Amazon MSK cluster using the AWS Management Console 55

Amazon Managed Streaming for Apache Kafka Developer Guide

the current number of brokers in the cluster. It must also be a multiple of the number of

Availability Zones.

aws kafka update-broker-count --cluster-arn ClusterArn --current-version Current-

Cluster-Version --target-number-of-broker-nodes Target-Number-of-Brokers

The output of this update-broker-count operation looks like the following JSON.

{

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/

abcdefab-1234-abcd-5678-cdef0123ab01-2",

"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef"

}

To get the result of the update-broker-count operation, run the following command,

replacing ClusterOperationArn with the ARN that you obtained in the output of the

update-broker-count command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON

example.

{

"ClusterOperationInfo": {

"ClientRequestId": "c0b7af47-8591-45b5-9c0c-909a1a2c99ea",

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/

exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2",

"CreationTime": "2019-09-25T23:48:04.794Z",

"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef",

"OperationState": "UPDATE_COMPLETE",

"OperationType": "INCREASE_BROKER_COUNT",

"SourceClusterInfo": {

"NumberOfBrokerNodes": 9

"TargetClusterInfo": {

Expand a Amazon MSK cluster using the AWS CLI 56

Amazon Managed Streaming for Apache Kafka Developer Guide

"NumberOfBrokerNodes": 12

}

In this output, OperationType is INCREASE_BROKER_COUNT. If OperationState has the

value UPDATE_IN_PROGRESS, wait a while, then run the describe-cluster-operation

command again.

Expand a Amazon MSK cluster using the API

To increase the number of brokers in a cluster using the API, see UpdateBrokerCount.

Remove a broker from an Amazon MSK cluster

Use this Amazon MSK operation when you want to remove brokers from Amazon Managed

Streaming for Apache Kafka (MSK) provisioned clusters. You can reduce your cluster’s storage and

compute capacity by removing sets of brokers, with no availability impact, data durability risk, or

disruption to your data streaming applications.

You can add more brokers to your cluster to handle increase in traﬃc, and remove brokers when

the traﬃc subsides. With broker addition and removal capability, you can best utilize your cluster

capacity and optimize your MSK infrastructure costs. Broker removal gives you broker-level control

over existing cluster capacity to ﬁt your workload needs and avoid migration to another cluster.

Use the AWS Console, Command Line Interface (CLI), SDK, or AWS CloudFormation to reduce

broker count of your provisioned cluster. MSK picks the brokers that do not have any partitions on

them (except for canary topics) and prevents applications from producing data to those brokers,

while safely removing those brokers from the cluster.

You should remove one broker per Availability Zone, if you want to reduce a cluster’s storage and

compute. For example, you can remove two brokers from a two Availability Zone cluster, or three

brokers from a three Availability Zone cluster in a single broker removal operation.

For information about how to rebalance partitions after you remove brokers from a cluster, see the

section called “Reassign partitions”.

You can remove brokers from all M5 and M7g based MSK provisioned clusters, regardless of the

instance size.

Expand a Amazon MSK cluster using the API 57

Amazon Managed Streaming for Apache Kafka Developer Guide

Broker removal is supported on Kafka versions 2.8.1 and above, including on KRaft mode clusters.

Topics

• Prepare to remove brokers by removing all partitions

• Remove a broker with the AWS Management Console

• Remove a broker with the AWS CLI

• Remove a broker with the AWS API

Prepare to remove brokers by removing all partitions

Before you start the broker removal process, ﬁrst move all partitions, except ones for topics

__amazon_msk_canary and __amazon_msk_canary_state from the brokers you plan to

remove. These are internal topics that Amazon MSK creates for cluster health and diagnostic

metrics.

You can use Kafka admin APIs or Cruise Control to move partitions to other brokers that you intend

to retain in the cluster. See Reassign partitions.

Example process to remove partitions

This section is an example of how to remove partitions from the broker you intend to remove.

Assume you have a cluster with 6 brokers, 2 brokers in each AZ, and it has four topics:

•

__amazon_msk_canary

•

__consumer_offsets

•

__amazon_msk_connect_offsets_my-mskc-connector_12345678-09e7-

c657f7e4ff32-2

•

msk-brk-rmv

1. Create a client machine as described in Create a client machine.

2. After conﬁguring the client machine, run the following command to list all the available topics

in your cluster.

./bin/kafka-topics.sh --bootstrap-server “CLUSTER_BOOTSTRAP_STRING” --list

Remove broker partitions 58

Amazon Managed Streaming for Apache Kafka Developer Guide

In this example, we see four topic names, __amazon_msk_canary, __consumer_offsets,

__amazon_msk_connect_offsets_my-mskc-connector_12345678-09e7-

c657f7e4ff32-2, and msk-brk-rmv.

Create a json ﬁle called topics.json on the client machine and add all the user topic names

as in the following code example. You don’t need to include the __amazon_msk_canary topic

name as this is a service managed topic that will be automatically moved when necessary.

{

"topics": [

{"topic": "msk-brk-rmv"},

{"topic": "__consumer_offsets"},

{"topic": "__amazon_msk_connect_offsets_my-mskc-connector_12345678-09e7-

c657f7e4ff32-2"}

"version":1

}

4. Run the following command to generate a proposal to move partitions to only 3 brokers out of

6 brokers on the cluster.

./bin/kafka-reassign-partitions.sh --bootstrap-server “CLUSTER_BOOTSTRAP_STRING” --

topics-to-move-json-file topics.json --broker-list 1,2,3 --generate

Create a ﬁle called reassignment-file.json and copy the proposed partition

reassignment configuration you got from above command.

Run the following command to move partitions that you speciﬁed in the reassignment-

file.json.

./bin/kafka-reassign-partitions.sh --bootstrap-server “CLUSTER_BOOTSTRAP_STRING” --

reassignment-json-file reassignment-file.json --execute

The output looks similar to the following:

Successfully started partition reassignments for morpheus-test-topic-1-0,test-

topic-1-0

7. Run the following command to verify all partitions have moved.

Remove broker partitions 59

Amazon Managed Streaming for Apache Kafka Developer Guide

./bin/kafka-reassign-partitions.sh --bootstrap-server “CLUSTER_BOOTSTRAP_STRING” --

reassignment-json-file reassignment-file.json --verify

The output looks similar to the following. Monitor the status until all partitions in your

requested topics have been reassignmed successfully:

Status of partition reassignment:

Reassignment of partition msk-brk-rmv-0 is completed.

Reassignment of partition msk-brk-rmv-1 is completed.

Reassignment of partition __consumer_offsets-0 is completed.

Reassignment of partition __consumer_offsets-1 is completed.

8. When the status indicates that the partition reassignment for each partition is completed,

monitor the UserPartitionExists metrics for 5 minutes to ensure it displays 0 for the

brokers from which you moved the partitions. After conﬁrming this, you can proceed to

remove the broker from the cluster.

Remove a broker with the AWS Management Console

To remove brokers with the AWS Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose the MSK cluster that contains brokers you want to remove.

3. On the cluster details page, choose the Actions button and select the Edit number of brokers

option.

4. Enter the number of brokers that you want the cluster to have per Availability Zone. The

console summarizes the number of brokers across availability zones that will be removed.

Make sure this what you want.

5. Choose Save changes.

To prevent accidental broker removal, the console asks you to conﬁrm that you want to delete

brokers.

Remove a broker with the Console 60

Amazon Managed Streaming for Apache Kafka Developer Guide

Remove a broker with the AWS CLI

Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN)

that you obtained when you created your cluster. If you don't have the ARN for your cluster, you

can ﬁnd it by listing all clusters. For more information, Listing Amazon MSK clusters. Replace

Current-Cluster-Version with the current version of the cluster.

Important

Cluster versions aren't simple integers. To ﬁnd the current version of the cluster, use the

DescribeCluster operation or the describe-cluster AWS CLI command. An example version is

KTVPDKIKX0DER.

The Target-Number-of-Brokers parameter represents the total number of broker nodes that

you want the cluster to have when this operation completes successfully. The value you specify for

Target-Number-of-Brokers must be a whole number that is less than the current number of

brokers in the cluster. It must also be a multiple of the number of Availability Zones.

aws kafka update-broker-count --cluster-arn ClusterArn --current-version Current-

Cluster-Version --target-number-of-broker-nodes Target-Number-of-Brokers

The output of this update-broker-count operation looks like the following JSON.

{

"ClusterOperationInfo": {

"ClientRequestId": "c0b7af47-8591-45b5-9c0c-909a1a2c99ea",

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/

abcdefab-1234-abcd-5678-cdef0123ab01-2",

"CreationTime": "2019-09-25T23:48:04.794Z",

"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef",

"OperationState": "UPDATE_COMPLETE",

"OperationType": "DECREASE_BROKER_COUNT",

"SourceClusterInfo": {

"NumberOfBrokerNodes": 12

"TargetClusterInfo": {

"NumberOfBrokerNodes": 9

Remove a broker with the CLI 61

Amazon Managed Streaming for Apache Kafka Developer Guide

}

In this output, OperationType is DECREASE_BROKER_COUNT. If OperationState has the value

UPDATE_IN_PROGRESS, wait a while, then run the describe-cluster-operation command

again.

Remove a broker with the AWS API

To remove brokers in a cluster using the API, see UpdateBrokerCount in the Amazon Managed

Streaming for Apache Kafka API Reference.

Update security settings of a Amazon MSK cluster

Use this Amazon MSK operation to update the authentication and client-broker encryption settings

of your MSK cluster. You can also update the Private Security Authority used to sign certiﬁcates for

mutual TLS authentication. You can't change the in-cluster (broker-to-broker) encryption setting.

The cluster must be in the ACTIVE state for you to update its security settings.

If you turn on authentication using IAM, SASL, or TLS, you must also turn on encryption between

clients and brokers. The following table shows the possible combinations.

Authentication Client-broker encryption

options

Broker-broker encryption

Unauthenticated TLS, PLAINTEXT, TLS_PLAIN

TEXT

Can be on or oﬀ.

mTLS TLS, TLS_PLAINTEXT Must be on.

SASL/SCRAM TLS Must be on.

SASL/IAM TLS Must be on.

When client-broker encryption is set to TLS_PLAINTEXT and client-authentication is set to mTLS,

Amazon MSK creates two types of listeners for clients to connect to: one listener for clients to

Remove a broker with the API 62

Amazon Managed Streaming for Apache Kafka Developer Guide

connect using mTLS authentication with TLS Encryption, and another for clients to connect

without authentication or encryption (plaintext).

For more information about security settings, see Security.

Update Amazon MSK cluster security settings using the AWS

Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose the MSK cluster that you want to update.

3. In the Security settings section, choose Edit.

4. Choose the authentication and encryption settings that you want for the cluster, then choose

Save changes.

Updating Amazon MSK cluster security settings using the AWS CLI

1. Create a JSON ﬁle that contains the encryption settings that you want the cluster to have. The

following is an example.

Note

You can only update the client-broker encryption setting. You can't update the in-

cluster (broker-to-broker) encryption setting.

{"EncryptionInTransit":{"ClientBroker": "TLS"}}

2. Create a JSON ﬁle that contains the authentication settings that you want the cluster to have.

The following is an example.

{"Sasl":{"Scram":{"Enabled":true}}}

3. Run the following AWS CLI command:

aws kafka update-security --cluster-arn ClusterArn --current-version Current-

Cluster-Version --client-authentication file://Path-to-Authentication-Settings-

JSON-File --encryption-info file://Path-to-Encryption-Settings-JSON-File

Update Amazon MSK cluster security settings using the AWS Management Console 63

Amazon Managed Streaming for Apache Kafka Developer Guide

The output of this update-security operation looks like the following JSON.

{

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/

abcdefab-1234-abcd-5678-cdef0123ab01-2",

"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef"

}

To see the status of the update-security operation, run the following command, replacing

ClusterOperationArn with the ARN that you obtained in the output of the update-

security command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON

example.

{

"ClusterOperationInfo": {

"ClientRequestId": "c0b7af47-8591-45b5-9c0c-909a1a2c99ea",

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/

exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2",

"CreationTime": "2021-09-17T02:35:47.753000+00:00",

"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef",

"OperationState": "PENDING",

"OperationType": "UPDATE_SECURITY",

"SourceClusterInfo": {},

"TargetClusterInfo": {}

}

If OperationState has the value PENDING or UPDATE_IN_PROGRESS, wait a while, then run

the describe-cluster-operation command again.

Update Amazon MSK cluster security settings using the AWS CLI 64

Amazon Managed Streaming for Apache Kafka Developer Guide

Note

The AWS CLI and API operations for updating the security settings of a cluster are

idempotent. This means that if you invoke the security update operation and specify an

authentication or encryption setting that is the same setting that the cluster currently has,

that setting won't change.

Updating a cluster's security settings using the API

To update the security settings for a Amazon MSK cluster using the API, see UpdateSecurity.

Note

The AWS CLI and API operations for updating the security settings of a MSK cluster are

idempotent. This means that if you invoke the security update operation and specify an

authentication or encryption setting that is the same setting that the cluster currently has,

that setting won't change.

Reboot a broker for an Amazon MSK cluster

Use this Amazon MSK operation when you want to reboot a broker for your MSK cluster. To reboot

a broker for a cluster, make sure that the cluster in the ACTIVE state.

The Amazon MSK service may reboot the brokers for your MSK cluster during system maintenance,

such as patching or version upgrades. Rebooting a broker manually lets you test resilience of your

Kafka clients to determine how they respond to system maintenance.

Reboot a broker for an Amazon MSK cluster using the AWS

Management Console

This process describes how to reboot a broker for a Amazon MSK cluster using the AWS

Management Console.

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose the MSK cluster whose broker you want to reboot.

3. Scroll down to the Broker details section, and choose the broker you want to reboot.

Update Amazon MSK cluster security settings using the API 65

Amazon Managed Streaming for Apache Kafka Developer Guide

4. Choose the Reboot broker button.

Reboot a broker for an Amazon MSK cluster using the AWS CLI

This process describes how to reboot a broker for a Amazon MSK cluster using the AWS CLI.

Run the following command, replacing ClusterArn with the Amazon Resource Name (ARN)

that you obtained when you created your cluster, and the BrokerId with the ID of the broker

that you want to reboot.

Note

The reboot-broker operation only supports rebooting one broker at a time.

If you don't have the ARN for your cluster, you can ﬁnd it by listing all clusters. For more

information, see the section called “List clusters”.

If you don't have the broker IDs for your cluster, you can ﬁnd them by listing the broker nodes.

For more information, see list-nodes.

aws kafka reboot-broker --cluster-arn ClusterArn --broker-ids BrokerId

The output of this reboot-broker operation looks like the following JSON.

{

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/

abcdefab-1234-abcd-5678-cdef0123ab01-2",

"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef"

}

To get the result of the reboot-broker operation, run the following command, replacing

ClusterOperationArn with the ARN that you obtained in the output of the reboot-

broker command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

Reboot a broker for an Amazon MSK cluster using the AWS CLI 66

Amazon Managed Streaming for Apache Kafka Developer Guide

The output of this describe-cluster-operation command looks like the following JSON

example.

{

"ClusterOperationInfo": {

"ClientRequestId": "c0b7af47-8591-45b5-9c0c-909a1a2c99ea",

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/

exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2",

"CreationTime": "2019-09-25T23:48:04.794Z",

"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef",

"OperationState": "REBOOT_IN_PROGRESS",

"OperationType": "REBOOT_NODE",

"SourceClusterInfo": {},

"TargetClusterInfo": {}

}

When the reboot operation is complete, the OperationState is REBOOT_COMPLETE.

Reboot a broker for an Amazon MSK cluster using the using the API

To reboot a broker in a cluster using the API, see RebootBroker.

Impact of broker restarts during patching and other

maintenance

Periodically, Amazon MSK updates software on your brokers. These updates have no impact on

your applications' writes and reads if you follow best practices.

Amazon MSK uses rolling updates for software to maintain high availability of your clusters. During

this process, brokers are rebooted one at a time, and Kafka automatically moves leadership to

another online broker. Kafka clients have built-in mechanisms to automatically detect the change

in leadership for the partitions and continue to write and read data into a MSK cluster.

Following a broker going oﬄine, it is normal to see transient disconnect errors on your clients. You

will also observe for a brief window (up to 2 mins, typically less) some spikes in p99 read and write

Reboot a broker for an Amazon MSK cluster using the API 67

Amazon Managed Streaming for Apache Kafka Developer Guide

latency (typically high milliseconds, up to ~2 seconds). These spikes are expected and are caused by

the client re-reconnecting to a new leader broker; it does not impact your produce or consume and

will resolve following the re-connect. For more information, see Broker oﬄine and client failover.

You will also observe an increase in the metric UnderReplicatedPartitions, which is expected

as the partitions on the broker that was shutdown are no longer replicating data. This has no

impact on applications' writes and reads as replicas for these partitions that are hosted on other

brokers are now serving the requests.

After the software update, when the broker comes back online, it needs to "catch up" on the

messages produced while it was oﬄine. During catch up, you may also observe an increase in usage

of the volume throughput and CPU. These should have no impact on writes and reads into the

cluster if you have enough CPU, memory, network, and volume resources on your brokers.

Tag an Amazon MSK cluster

You can assign your own metadata in the form of tags to an Amazon MSK resource, such as an

MSK cluster. A tag is a key-value pair that you deﬁne for the resource. Using tags is a simple yet

powerful way to manage AWS resources and organize data, including billing data.

Topics

• Tag basics for Amazon MSK clusters

• Track Amazon MSK cluster costs using tagging

• Tag restrictions

• Tag resources using the Amazon MSK API

Tag basics for Amazon MSK clusters

You can use the Amazon MSK API to complete the following tasks:

• Add tags to an Amazon MSK resource.

• List the tags for an Amazon MSK resource.

• Remove tags from an Amazon MSK resource.

You can use tags to categorize your Amazon MSK resources. For example, you can categorize your

Amazon MSK clusters by purpose, owner, or environment. Because you deﬁne the key and value for

Tag a Amazon MSK cluster 68

Amazon Managed Streaming for Apache Kafka Developer Guide

each tag, you can create a custom set of categories to meet your speciﬁc needs. For example, you

might deﬁne a set of tags that help you track clusters by owner and associated application.

The following are several examples of tags:

•

Project: Project name

•

Owner: Name

•

Purpose: Load testing

•

Environment: Production

Track Amazon MSK cluster costs using tagging

You can use tags to categorize and track your AWS costs. When you apply tags to your AWS

resources, including Amazon MSK clusters, your AWS cost allocation report includes usage and

costs aggregated by tags. You can organize your costs across multiple services by applying tags

that represent business categories (such as cost centers, application names, or owners). For more

information, see Use Cost Allocation Tags for Custom Billing Reports in the AWS Billing User Guide.

Tag restrictions

The following restrictions apply to tags in Amazon MSK.

Basic restrictions

• The maximum number of tags per resource is 50.

• Tag keys and values are case-sensitive.

• You can't change or edit tags for a deleted resource.

Tag key restrictions

• Each tag key must be unique. If you add a tag with a key that's already in use, your new tag

overwrites the existing key-value pair.

•

You can't start a tag key with aws: because this preﬁx is reserved for use by AWS. AWS creates

tags that begin with this preﬁx on your behalf, but you can't edit or delete them.

• Tag keys must be between 1 and 128 Unicode characters in length.

• Tag keys must consist of the following characters: Unicode letters, digits, white space, and the

following special characters: _ . / = + - @.

Tag basics for Amazon MSK clusters 69

Amazon Managed Streaming for Apache Kafka Developer Guide

Tag value restrictions

• Tag values must be between 0 and 255 Unicode characters in length.

• Tag values can be blank. Otherwise, they must consist of the following characters: Unicode

letters, digits, white space, and any of the following special characters: _ . / = + - @.

Tag resources using the Amazon MSK API

You can use the following operations to tag or untag an Amazon MSK resource or to list the current

set of tags for a resource:

• ListTagsForResource

• TagResource

• UntagResource

Broker oﬄine and client failover

Kafka allows for an oﬄine broker; a single oﬄine broker in a healthy and balanced cluster

following best practices will not see impact or cause failure to produce or consume. This is

because another broker will take over partition leadership and because the Kafka client lib will

automatically fail-over and start sending requests to the new leader brokers.

Client server contract

This results in a shared contract between the client library and server-side behavior; the server

must successfully assign one or more new leaders and the client must change brokers to send

requests to the new leaders in a timely manner.

Kafka uses exceptions to control this ﬂow:

An example procedure

1. Broker A enters an oﬄine state.

2. Kafka client receives an exception (typically network disconnect or not_leader_for_partition).

3. These exceptions trigger the Kafka client to update its metadata so that it knows about the

latest leaders.

4. Kafka client resumes sending requests to the new partition leaders on other brokers.

Tag resources using the Amazon MSK API 70

Amazon Managed Streaming for Apache Kafka Developer Guide

This process typically takes less than 2 seconds with the vended Java client and default

conﬁgurations. The client side errors are verbose and repetitive but not cause for concern, as

denoted by the “WARN” level.

Example: Exception 1

10:05:25.306 [kafka-producer-network-thread | producer-1] WARN

o.a.k.c.producer.internals.Sender - [Producer clientId=producer-1] Got

error produce response with correlation id 864845 on topic-partition

msk-test-topic-1-0, retrying (2147483646 attempts left). Error:

NETWORK_EXCEPTION. Error Message: Disconnected from node 2

Example: Exception 2

10:05:25.306 [kafka-producer-network-thread | producer-1] WARN

o.a.k.c.producer.internals.Sender - [Producer clientId=producer-1] Received

invalid metadata error in produce request on partition msk-test-topic-1-41

due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For

requests intended only for the leader, this error indicates that the broker

is not the current leader. For requests intended for any replica, this

error indicates that the broker is not a replica of the topic partition..

Going to request metadata update now"

Kafka clients will automatically resolve these errors typically within 1 second and at most 3

seconds. This presents as produce/consume latency at p99 in client side metrics (typically high

milliseconds in the 100’s). Any longer than this typically indicates an issue with client conﬁguration

or server-side controller load. Please see the troubleshooting section.

A successful fail-over can be veriﬁed by checking the BytesInPerSec and LeaderCount metrics

increase on other brokers which proves that the traﬃc and leadership moved as expected. You will

also observe an increase in the UnderReplicatedPartitions metric, which is expected when

replicas are oﬄine with the shutdown broker.

Troubleshooting

The above ﬂow can be disrupted by breaking the client-server contract. The most common reasons

for issue include:

• Misconﬁguration or incorrect usage of Kafka client libs.

• Unexpected default behaviours and bugs with 3rd party client libs.

Broker oﬄine and client failover 71

Amazon Managed Streaming for Apache Kafka Developer Guide

• Overloaded controller resulting in slower partition leader assignment.

• New controller is being elected resulting in slower partition leader assignment.

In order to ensure correct behaviour to handle leadership fail-over, we recommend:

• Server side best practices must be followed to ensure that the controller broker is scaled

appropriately to avoid slow leadership assignment.

• Client libraries must have retries enabled to ensure that client handles the failover.

• Client libraries must have retry.backoﬀ.ms conﬁgured (default 100) to avoid connection/request

storms.

• Client libraries must set request.timeout.ms and delivery.timeout.ms to values inline with the

applications’ SLA. Higher values will result in slower fail-over for certain failure types.

• Client libraries must ensure that bootstrap.servers contains at least 3 random brokers to avoid an

availability impact on initial discovery.

• Some client libraries are lower level than others and expect the application developer to

implement retry logic and exception handling themselves. Please refer to client lib speciﬁc

documentation for example usage, and ensure that correct reconnect/retry logic is followed.

• We recommend monitoring client side latency for produces, successful request count, and error

count for non-retryable errors.

• We have observed that older 3rd party golang and ruby libraries remain verbose during an

entire broker oﬄine time period despite produces and consume requests being unaﬀected. We

recommend you always monitor your business level metrics besides request metrics for success

and errors to determine if there is real impact vs noise in your logs.

• Customers should not alarm on transient exceptions for network/not_leader as they are normal,

non-impacting, and expected as part of the kafka protocol.

• Customers should not alarm on UnderReplicatedPartitions as they are normal, non-impacting,

and expected during a single oﬄine broker.

Broker oﬄine and client failover 72

Amazon Managed Streaming for Apache Kafka Developer Guide

Amazon MSK conﬁguration

Amazon Managed Streaming for Apache Kafka provides a default conﬁguration for brokers,

topics, and Apache ZooKeeper nodes. You can also create custom conﬁgurations and use them to

create new MSK clusters or to update existing clusters. An MSK conﬁguration consists of a set of

properties and their corresponding values.

Topics

• Custom Amazon MSK conﬁgurations

• Default Amazon MSK conﬁguration

• Guidelines for Amazon MSK tiered storage topic-level conﬁguration

• Amazon MSK conﬁguration operations

Custom Amazon MSK conﬁgurations

You can use Amazon MSK to create a custom MSK conﬁguration where you set the following

properties. Properties that you don't set explicitly get the values they have in the section called

“Default Amazon MSK conﬁguration”. For more information about conﬁguration properties, see

Apache Kafka Conﬁguration.

Apache Kafka conﬁguration properties

Name Description

allow.everyone.if.no.acl.found

If you want to set this property tofalse,

ﬁrst make sure you deﬁne Apache Kafka ACLs

for your cluster. If you set this property to

falseand you don't ﬁrst deﬁne Apache Kafka

ACLs, you lose access to the cluster. If that

happens, you can update the conﬁguration

again and set this property to true to regain

access to the cluster.

auto.create.topics.enable Enables topic auto-creation on the server.

compression.type The ﬁnal compression type for a given topic.

You can set this property to the standard

Custom Amazon MSK conﬁgurations 73

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description

compression codecs (gzip, snappy, lz4, and

zstd). It additionally accepts uncompres

sed . This value is equivalent to no compressi

on. If you set the value to producer, it means

retain the original compression codec that the

producer sets.

connections.max.idle.ms Idle connections timeout in milliseconds. The

server socket processor threads close the

connections that are idle for more than the

value that you set for this property.

default.replication.factor The default replication factor for automatic

ally created topics.

delete.topic.enable Enables the delete topic operation. If you

turn oﬀ this setting, you can't delete a topic

through the admin tool.

group.initial.rebalance.delay.ms Amount of time the group coordinator waits

for more data consumers to join a new group

before the group coordinator performs the

ﬁrst rebalance. A longer delay means potential

ly fewer rebalances, but this increases the time

until processing begins.

group.max.session.timeout.ms Maximum session timeout for registered

consumers. Longer timeouts give consumers

more time to process messages between

heartbeats at the cost of a longer time to

detect failures.

Custom Amazon MSK conﬁgurations 74

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description

group.min.session.timeout.ms Minimum session timeout for registered

consumers. Shorter timeouts result in quicker

failure detection at the cost of more frequent

consumer heartbeats. This can overwhelm

broker resources.

leader.imbalance.per.broker.percentage The ratio of leader imbalance allowed per

broker. The controller triggers a leader balance

if it exceeds this value per broker. This value is

speciﬁed in percentage.

log.cleaner.delete.retention.ms Amount of time that you want Apache Kafka

to retain deleted records. The minimum value

is 0.

log.cleaner.min.cleanable.ratio This conﬁguration property can have values

between 0 and 1. This value determines how

frequently the log compactor attempts to

clean the log (if log compaction is enabled).

By default, Apache Kafka avoids cleaning a

log if more than 50% of the log has been

compacted. This ratio bounds the maximum

space that the log wastes with duplicates (at

50%, this means at most 50% of the log could

be duplicates). A higher ratio means fewer,

more eﬃcient cleanings, but more wasted

space in the log.

log.cleanup.policy The default cleanup policy for segments

beyond the retention window. A comma-sep

arated list of valid policies. Valid policies are

delete and compact. For Tiered Storage

enabled clusters, valid policy is delete only.

log.ﬂush.interval.messages Number of messages that accumulate on a log

partition before messages are ﬂushed to disk.

Custom Amazon MSK conﬁgurations 75

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description

log.ﬂush.interval.ms Maximum time in milliseconds that a message

in any topic remains in memory before ﬂushed

to disk. If you don't set this value, the value

in log.ﬂush.scheduler.interval.ms is used. The

minimum value is 0.

log.message.timestamp.diﬀerence.max.ms The maximum time diﬀerence between the

timestamp when a broker receives a message

and the timestamp speciﬁed in the message.

If log.message.timestamp.type=CreateTim

e, a message is rejected if the diﬀerence

in timestamp exceeds this threshold. This

conﬁguration is ignored if log.message.timest

amp.type=LogAppendTime.

log.message.timestamp.type Speciﬁes if the timestamp in the message is

the message creation time or the log append

time. The allowed values are CreateTime

and LogAppendTime .

log.retention.bytes Maximum size of the log before deleting it.

log.retention.hours Number of hours to keep a log ﬁle before

deleting it, tertiary to the log.retention.ms

property.

log.retention.minutes Number of minutes to keep a log ﬁle before

deleting it, secondary to log.retention.ms

property. If you don't set this value, the value

in log.retention.hours is used.

log.retention.ms Number of milliseconds to keep a log ﬁle

before deleting it (in milliseconds), If not set,

the value in log.retention.minutes is used.

Custom Amazon MSK conﬁgurations 76

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description

log.roll.ms Maximum time before a new log segment is

rolled out (in milliseconds). If you don't set

this property, the value in log.roll.hours is

used. The minimum possible value for this

property is 1.

log.segment.bytes Maximum size of a single log ﬁle.

max.incremental.fetch.session.cache.slots Maximum number of incremental fetch

sessions that are maintained.

message.max.bytes Largest record batch size that Kafka allows.

If you increase this value and there are

consumers older than 0.10.2, you must also

increase the fetch size of the consumers so

that they can fetch record batches this large.

The latest message format version always

groups messages into batches for eﬃciency.

Previous message format versions don't group

uncompressed records into batches, and in

such a case, this limit only applies to a single

record.

You can set this value per topic with the topic

level max.message.bytes conﬁg.

Custom Amazon MSK conﬁgurations 77

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description

min.insync.replicas

When a producer sets acks to "all" (or "-1"),

the value in min.insync.replicas speciﬁes

the minimum number of replicas that must

acknowledge a write for the write to be

considered successful. If this minimum cannot

be met, the producer raises an exception

(either NotEnoughReplicas or NotEnough

ReplicasAfterAppend).

You can use values in min.insync.replicas and

acks to enforce greater durability guarantees.

For example, you might create a topic with a

replication factor of 3, set min.insync.replica

s to 2, and produce with acks of "all". This

ensures that the producer raises an exception

if a majority of replicas don't receive a write.

num.io.threads The number of threads that the server uses for

processing requests, which may include disk I/

num.network.threads The number of threads that the server uses to

receive requests from the network and send

responses to it.

num.partitions Default number of log partitions per topic.

num.recovery.threads.per.data.dir The number of threads per data directory to

be used to recover logs at startup and and to

ﬂush them at shutdown.

num.replica.fetchers The number of fetcher threads used to

replicate messages from a source broker.

If you increase this value, you can increase

the degree of I/O parallelism in the follower

broker.

Custom Amazon MSK conﬁgurations 78

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description

oﬀsets.retention.minutes After a consumer group loses all its consumers

(that is, it becomes empty) its oﬀsets are

kept for this retention period before getting

discarded. For standalone consumers (that

is,those that use manual assignment), oﬀsets

expire after the time of the last commit plus

this retention period.

oﬀsets.topic.replication.factor The replication factor for the oﬀsets topic. Set

this value higher to ensure availability. Internal

topic creation fails until the cluster size meets

this replication factor requirement.

replica.fetch.max.bytes Number of bytes of messages to attempt to

fetch for each partition. This is not an absolute

maximum. If the ﬁrst record batch in the ﬁrst

non-empty partition of the fetch is larger

than this value, the record batch is returned

to ensure progress. The message.max.bytes

(broker conﬁg) or max.message.bytes (topic

conﬁg) deﬁnes the maximum record batch size

that the broker accepts.

replica.fetch.response.max.bytes The maximum number of bytes expected for

the entire fetch response. Records are fetched

in batches, and if the ﬁrst record batch in the

ﬁrst non-empty partition of the fetch is larger

than this value, the record batch will still be

returned to ensure progress. This isn't an

absolute maximum. The message.max.bytes

(broker conﬁg) or max.message.bytes (topic

conﬁg) properties specify the maximum record

batch size that the broker accepts.

Custom Amazon MSK conﬁgurations 79

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description

replica.lag.time.max.ms If a follower hasn't sent any fetch requests or

hasn't consumed up to the leader's log end

oﬀset for at least this number of milliseconds,

the leader removes the follower from the ISR.

MinValue: 10000

MaxValue = 30000

replica.selector.class The fully-qualiﬁed class name that implement

s ReplicaSelector. The broker uses this value

to ﬁnd the preferred read replica. If you

use Apache Kafka version 2.4.1 or higher,

and want to allow consumers to fetch

from the closest replica, set this property

to org.apache.kafka.common.rep

lica.RackAwareReplicaSelector .

For more information, see the section called

“Apache Kafka version 2.4.1 (use 2.4.1.1

instead)”.

replica.socket.receive.buﬀer.bytes The socket receive buﬀer for network

requests.

socket.receive.buffer.bytes The SO_RCVBUF buﬀer of the socket server

sockets. The minimum value that you can

set for this property is -1. If the value is -1,

Amazon MSK uses the OS default.

socket.request.max.bytes The maximum number of bytes in a socket

request.

socket.send.buﬀer.bytes The SO_SNDBUF buﬀer of the socket server

sockets. The minimum value that you can

set for this property is -1. If the value is -1,

Amazon MSK uses the OS default.

Custom Amazon MSK conﬁgurations 80

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description

transaction.max.timeout.ms Maximum timeout for transactions. If the

requested transaction time of a client exceeds

this value, the broker returns an error in

InitProducerIdRequest. This prevents a client

from too large of a timeout, and this can stall

consumers that read from topics included in

the transaction.

transaction.state.log.min.isr Overridden min.insync.replicas conﬁguration

for the transaction topic.

transaction.state.log.replication.factor The replication factor for the transaction

topic. Set this property to a higher value to

increase availability. Internal topic creation

fails until the cluster size meets this replicati

on factor requirement.

transactional.id.expiration.ms The time in milliseconds that the transacti

on coordinator waits to receive any transacti

on status updates for the current transaction

before the coordinator expires its transacti

onal ID. This setting also inﬂuences producer

ID expiration because it causes producer IDs

expire when this time elapses after the last

write with the given producer ID. Producer

IDs might expire sooner if the last write from

the producer ID is deleted because of the

retention settings for the topic. The minimum

value for this property is 1 millisecond.

unclean.leader.election.enable Indicates if replicas not in the ISR set should

serve as leader as a last resort, even though

this might result in data loss.

Custom Amazon MSK conﬁgurations 81

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description

zookeeper.connection.timeout.ms ZooKeeper mode clusters. Maximum time that

the client waits to establish a connection to

ZooKeeper. If you don't set this value, the

value in zookeeper.session.timeout.ms is used.

MinValue = 6000

MaxValue (inclusive) = 18000

zookeeper.session.timeout.ms ZooKeeper mode clusters. The Apache

ZooKeeper session timeout in milliseconds.

MinValue = 6000

MaxValue (inclusive) = 18000

To learn how you can create a custom MSK conﬁguration, list all conﬁgurations, or describe them,

see the section called “Amazon MSK conﬁguration operations”. To create an MSK cluster with a

custom MSK conﬁguration, or to update a cluster with a new custom conﬁguration, see How it

works.

When you update your existing MSK cluster with a custom MSK conﬁguration, Amazon MSK does

rolling restarts when necessary, and uses best practices to minimize customer downtime. For

example, after Amazon MSK restarts each broker, Amazon MSK tries to let the broker catch up on

data that the broker might have missed during the conﬁguration update before it moves to the

next broker.

Dynamic Amazon MSK conﬁguration

In addition to the conﬁguration properties that Amazon MSK provides, you can dynamically set

cluster-level and broker-level conﬁguration properties that don't require a broker restart. You

can dynamically set some conﬁguration properties. These are the properties not marked as read-

only in the table under Broker Conﬁgs in the Apache Kafka documentation. For information on

dynamic conﬁguration and example commands, see Updating Broker Conﬁgs in the Apache Kafka

documentation.

Dynamic Amazon MSK conﬁguration 82

Amazon Managed Streaming for Apache Kafka Developer Guide

Note

You can set the advertised.listeners property, but not the listeners property.

Topic-level Amazon MSK conﬁguration

You can use Apache Kafka commands to set or modify topic-level conﬁguration properties for new

and existing topics. For more information on topic-level conﬁguration properties and examples on

how to set them, see Topic-Level Conﬁgs in the Apache Kafka documentation.

Amazon MSK conﬁguration states

An Amazon MSK conﬁguration can be in one of the following states. To perform an operation on a

conﬁguration, the conﬁguration must be in the ACTIVE or DELETE_FAILED state:

•

ACTIVE

•

DELETING

•

DELETE_FAILED

Default Amazon MSK conﬁguration

When you create an MSK cluster and don't specify a custom MSK conﬁguration, Amazon MSK

creates and uses a default conﬁguration with the values shown in the following table. For

properties that aren't in this table, Amazon MSK uses the defaults associated with your version of

Apache Kafka. For a list of these default values, see Apache Kafka Conﬁguration.

Default conﬁguration values

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

allow.everyone.if.

no.acl.found

If no resource

patterns match a

speciﬁc resource,

the resource has no

associated ACLs.

true true

Topic-level Amazon MSK conﬁguration 83

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

In this case, if you

set this property to

true, all users can

access the resource,

not just the super

users.

auto.create.topics

.enable

Enables autocreat

ion of a topic on the

server.

false false

auto.leader.rebala

nce.enable

Enables auto

leader balancing. A

background thread

checks and initiates

leader balance at

regular intervals, if

necessary.

true true

default.replicatio

n.factor

Default replication

factors for automatic

ally created topics.

3 for clusters in 3

Availability Zones,

and 2 for clusters in 2

Availability Zones.

3 for clusters in 3

Availability Zones,

and 2 for clusters in 2

Availability Zones.

Default Amazon MSK conﬁguration 84

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

local.retention.bytes The maximum size of

local log segments

for a partition before

it deletes the old

segments. If you

don't set this value,

the value in log.reten

tion.bytes is used.

The eﬀective value

should always be less

than or equal to the

log.retention.byte

s value. The default

value of -2 indicates

that there is no limit

on local retention

. This corresponds

to the retention

.ms/bytes setting

of -1. The propertie

s local.retention.ms

and local.retention.by

tes are similar to

log.retention as

they are used to

determine how long

the log segments

should remain

in local storage.

Existing log.reten

tion.* conﬁgura

tions are retention

-2 for unlimited -2 for unlimited

Default Amazon MSK conﬁguration 85

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

conﬁgurations for the

topic partition. This

includes both local

and remote storage.

Valid values: integers

in [-2; +Inf]

Default Amazon MSK conﬁguration 86

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

local.retention.ms The number of

milliseconds to retain

the local log segment

before deletion. If

you don't set this

value, Amazon MSK

uses the value in

log.retention.ms.

The eﬀective value

should always be less

than or equal to the

log.retention.byte

s value. The default

value of -2 indicates

that there is no limit

on local retention.

This corresponds to

the retention.ms/

bytes setting of -1.

The values local.ret

ention.ms and

local.retention.by

tes are similar to

log.retention. MSK

uses this conﬁgura

tion to determine

how long the log

segments should

remain in local

storage. Existing

log.retention.*

conﬁgurations are

-2 for unlimited -2 for unlimited

Default Amazon MSK conﬁguration 87

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

retention conﬁgura

tions for the topic

partition. This

includes both local

and remote storage.

Valid values are

integers greater than

Default Amazon MSK conﬁguration 88

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

log.message.timest

amp.diﬀerence.max

.ms

The maximum

diﬀerence allowed

between the

timestamp when

a broker receives a

message and the

timestamp speciﬁed

in the message. If

log.message.timest

amp.type=CreateTim

e, a message will

be rejected if

the diﬀerence in

timestamp exceeds

this threshold. This

conﬁguration is

ignored if log.messa

ge.timestamp.type=

LogAppendTime. The

maximum timestamp

diﬀerence allowed

should be no greater

than log.retention.ms

to avoid unnecessa

rily frequent log

rolling.

922337203

6854775807

86400000 for Kafka

2.8.2.tiered

log.segment.bytes The maximum size of

a single log ﬁle.

1073741824 134217728

Default Amazon MSK conﬁguration 89

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

min.insync.replicas When a producer

sets the value of acks

(acknowledgement

producer gets from

Kafka broker) to

"all" (or "-1"), the

value in min.insyn

c.replicas speciﬁes

the minimum number

of replicas that must

acknowledge a write

for the write to be

considered successfu

l. If this value doesn't

meet this minimum,

the producer raises

an exception (either

NotEnoughReplicas

or NotEnough

ReplicasAfterAppen

d).

When you use the

values in min.insyn

c.replicas and acks

together, you can

enforce greater

durability guarantee

s. For example,

you might create a

topic with a replicati

on factor of 3, set

min.insync.replica

2 for clusters in 3

Availability Zones,

and 1 for clusters in 2

Availability Zones.

2 for clusters in 3

Availability Zones,

and 1 for clusters in 2

Availability Zones.

Default Amazon MSK conﬁguration 90

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

s to 2, and produce

with acks of "all".

This ensures that

the producer raises

an exception if a

majority of replicas

don't receive a write.

num.io.threads Number of threads

that the server uses

to produce requests,

which may include

disk I/O.

8 max(8, vCPUs) where

vCPUs depends on

the instance size of

broker

num.network.threads Number of threads

that the server uses

to receive requests

from the network and

send responses to the

network.

5 max(5, vCPUs /

2) where vCPUs

depends on the

instance size of

broker

num.partitions Default number of

log partitions per

topic.

1 1

num.replica.fetchers Number of fetcher

threads used to

replicate messages

from a source

broker.If you increase

this value, you can

increase the degree

of I/O parallelism in

the follower broker.

2 max(2, vCPUs /

4) where vCPUs

depends on the

instance size of

broker

Default Amazon MSK conﬁguration 91

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

remote.log.msk.dis

able.policy

Used with remote.st

orage.enable to

disable tiered

storage. Set this

policy to Delete, to

indicate that data

in tiered storage is

deleted when you set

remote.storage.ena

ble to false.

N/A DELETE

remote.log.reader.

threads

Remote log reader

thread pool size,

which is used in

scheduling tasks

to fetch data from

remote storage.

N/A max(10, vCPUs *

0.67) where vCPUs

depends on the

instance size of

broker

Default Amazon MSK conﬁguration 92

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

remote.storage.ena

ble

Enables tiered

(remote) storage

for a topic if set to

true. Disables topic

level tiered storage

if set to false and

remote.log.msk.dis

able.policy is set

to Delete. When

you disable tiered

storage, you delete

data from remote

storage. When you

disable tiered storage

for a topic, you can't

enable it again.

false true

replica.lag.time.m

ax.ms

If a follower hasn't

sent any fetch

requests or hasn't

consumed up to

the leader's log end

oﬀset for at least this

number of milliseco

nds, the leader

removes the follower

from the ISR.

30000 30000

Default Amazon MSK conﬁguration 93

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

retention.ms Mandatory ﬁeld.

Minimum time is

3 days. There is no

default because the

setting is mandatory.

Amazon MSK

uses the retention

.ms value with

local.retention.ms to

determine when data

moves from local to

tiered storage. The

local.retention.ms

value speciﬁes

when to move

data from local to

tiered storage. The

retention.ms value

speciﬁes when to

remove data from

tiered storage (that

is, removed from the

cluster). Valid values:

integers in [-1; +Inf]

Minimum 259,200,0

00 milliseconds (3

days). -1 for inﬁnite

retention.

Minimum 259,200,0

00 milliseconds (3

days). -1 for inﬁnite

retention.

socket.receive.buf

fer.bytes

The SO_RCVBUF

buﬀer of the socket

sever sockets. If the

value is -1, the OS

default is used.

102400 102400

Default Amazon MSK conﬁguration 94

Amazon Managed Streaming for Apache Kafka Developer Guide

Name Description Default value for

non-tiered storage

cluster

Default value for

tiered storage-e

nabled cluster

socket.request.max

.bytes

Maximum number

of bytes in a socket

request.

104857600 104857600

socket.send.buﬀer

.bytes

The SO_SNDBUF

buﬀer of the socket

sever sockets. If the

value is -1, the OS

default is used.

102400 102400

unclean.leader.ele

ction.enable

Indicates if you want

replicas not in the ISR

set to serve as leader

as a last resort, even

though this might

result in data loss.

true false

zookeeper.session.

timeout.ms

The Apache

ZooKeeper session

timeout in milliseco

nds.

18000 18000

zookeeper.set.acl The set client to use

secure ACLs.

false false

For information on how to specify custom conﬁguration values, see the section called “Custom

Amazon MSK conﬁgurations”.

Guidelines for Amazon MSK tiered storage topic-level

conﬁguration

The following are default settings and limitations when you conﬁgure tiered storage at the topic

level.

Guidelines for tiered storage topic-level conﬁgurations 95

Amazon Managed Streaming for Apache Kafka Developer Guide

• Amazon MSK doesn't support smaller log segment sizes for topics with tiered storage activated.

If you want to create a segment, there is a minimum log segment size of 48 MiB, or a minimum

segment roll time of 10 minutes. These values map to the segment.bytes and segment.ms

properties.

• The value of local.retention.ms/bytes can't equal or exceed the retention.ms/bytes. This is the

tiered storage retention setting.

• The default value for for local.retention.ms/bytes is -2. This means that the retention.ms value

is used for local.retention.ms/bytes. In this case, data remains in both local storage and tiered

storage (one copy in each), and they expire together. For this option, a copy of the local data is

persisted to the remote storage. In this case, the data read from consume traﬃc comes from the

local storage.

• The default value for retention.ms is 7 days. There is no default size limit for retention.bytes.

• The minimum value for retention.ms/bytes is -1. This means inﬁnite retention.

• The minimum value for local.retention.ms/bytes is -2. This means inﬁnite retention for local

storage. It matches with the retention.ms/bytes setting as -1.

• The topic-level conﬁguration retention.ms is mandatory for topics with tiered storage activated.

The minimum retention.ms is 3 days.

Amazon MSK conﬁguration operations

This topic describes how to create custom MSK conﬁgurations and how to perform operations on

them. For information about how to use MSK conﬁgurations to create or update clusters, see How

it works.

This topic contains the following sections:

• Create an Amazon MSK conﬁguration

• Update an Amazon MSK conﬁguration

• Delete an Amazon MSK conﬁguration

• Get MSK conﬁguration metadata

• Get details about an Amazon MSK conﬁguration revision

• List all Amazon MSK conﬁgurations in your account for the current Region

Amazon MSK conﬁguration operations 96

Amazon Managed Streaming for Apache Kafka Developer Guide

Create an Amazon MSK conﬁguration

This process describes how to create a custom Amazon MSK conﬁguration and how to perform

operations on it.

1. Create a ﬁle where you specify the conﬁguration properties that you want to set and the

values that you want to assign to them. The following are the contents of an example

conﬁguration ﬁle.

auto.create.topics.enable = true

log.roll.ms = 604800000

Run the following AWS CLI command, and replace config-file-path with the path to the

ﬁle where you saved your conﬁguration in the previous step.

Note

The name that you choose for your conﬁguration must match the following regex:

"^[0-9A-Za-z][0-9A-Za-z-]{0,}$".

aws kafka create-configuration --name "ExampleConfigurationName" --description

"Example configuration description." --kafka-versions "1.1.1" --server-properties

fileb://config-file-path

The following is an example of a successful response after you run this command.

{

"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/

abcdabcd-1234-abcd-1234-abcd123e8e8e-1",

"CreationTime": "2019-05-21T19:37:40.626Z",

"LatestRevision": {

"CreationTime": "2019-05-21T19:37:40.626Z",

"Description": "Example configuration description.",

"Revision": 1

"Name": "ExampleConfigurationName"

}

Create an Amazon MSK conﬁguration 97

Amazon Managed Streaming for Apache Kafka Developer Guide

3. The previous command returns an Amazon Resource Name (ARN) for your new conﬁguration.

Save this ARN because you need it to refer to this conﬁguration in other commands. If you lose

your conﬁguration ARN, you can list all the conﬁgurations in your account to ﬁnd it again.

Update an Amazon MSK conﬁguration

This process describes how to update a custom Amazon MSK conﬁguration.

1. Create a ﬁle where you specify the conﬁguration properties that you want to update and

the values that you want to assign to them. The following are the contents of an example

conﬁguration ﬁle.

auto.create.topics.enable = true

min.insync.replicas = 2

Run the following AWS CLI command, and replace config-file-path with the path to the

ﬁle where you saved your conﬁguration in the previous step.

Replace configuration-arn with the ARN that you obtained when you created the

conﬁguration. If you didn't save the ARN when you created the conﬁguration, you can use the

list-configurations command to list all conﬁguration in your account. The conﬁguration

that you want in the list appears in the response. The ARN of the conﬁguration also appears in

that list.

aws kafka update-configuration --arn configuration-arn --description "Example

configuration revision description." --server-properties fileb://config-file-path

3. The following is an example of a successful response after you run this command.

{

"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/

abcdabcd-1234-abcd-1234-abcd123e8e8e-1",

"LatestRevision": {

"CreationTime": "2020-08-27T19:37:40.626Z",

"Description": "Example configuration revision description.",

"Revision": 2

}

Update an Amazon MSK conﬁguration 98

Amazon Managed Streaming for Apache Kafka Developer Guide

Delete an Amazon MSK conﬁguration

The following procedure shows how to delete a conﬁguration that isn't attached to a cluster. You

can't delete a conﬁguration that's attached to a cluster.

To run this example, replace configuration-arn with the ARN that you obtained when you

created the conﬁguration. If you didn't save the ARN when you created the conﬁguration, you

can use the list-configurations command to list all conﬁguration in your account. The

conﬁguration that you want in the list appears in the response. The ARN of the conﬁguration

also appears in that list.

aws kafka delete-configuration --arn configuration-arn

2. The following is an example of a successful response after you run this command.

{

"arn": " arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/

abcdabcd-1234-abcd-1234-abcd123e8e8e-1",

"state": "DELETING"

}

Get MSK conﬁguration metadata

The following procedure shows how to describe an Amazon MSK conﬁguration to get metadata

about the conﬁguration.

1. The following command returns metadata about the conﬁguration. To get a detailed

description of the conﬁguration, run the describe-configuration-revision.

To run this example, replace configuration-arn with the ARN that you obtained when you

created the conﬁguration. If you didn't save the ARN when you created the conﬁguration, you

can use the list-configurations command to list all conﬁguration in your account. The

conﬁguration that you want in the list appears in the response. The ARN of the conﬁguration

also appears in that list.

aws kafka describe-configuration --arn configuration-arn

2. The following is an example of a successful response after you run this command.

Delete an Amazon MSK conﬁguration 99

Amazon Managed Streaming for Apache Kafka Developer Guide

{

"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/abcdabcd-

abcd-1234-abcd-abcd123e8e8e-1",

"CreationTime": "2019-05-21T00:54:23.591Z",

"Description": "Example configuration description.",

"KafkaVersions": [

"1.1.1"

"LatestRevision": {

"CreationTime": "2019-05-21T00:54:23.591Z",

"Description": "Example configuration description.",

"Revision": 1

"Name": "SomeTest"

}

Get details about an Amazon MSK conﬁguration revision

This process gets you a detailed description of the Amazon MSK conﬁguration revision.

If you use the describe-configuration command to describe an MSK conﬁguration, you see

the metadata of the conﬁguration. To get a description of the conﬁguration, use the command,

describe-configuration-revision.

•

Run the following command and replace configuration-arn with the ARN that you

obtained when you created the conﬁguration. If you didn't save the ARN when you created

the conﬁguration, you can use the list-configurations command to list all conﬁguration

in your account. The conﬁguration that you want in the list that appears in the response. The

ARN of the conﬁguration also appears in that list.

aws kafka describe-configuration-revision --arn configuration-arn --revision 1

The following is an example of a successful response after you run this command.

{

"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/abcdabcd-

abcd-1234-abcd-abcd123e8e8e-1",

"CreationTime": "2019-05-21T00:54:23.591Z",

"Description": "Example configuration description.",

Get details about an Amazon MSK conﬁguration revision 100

Amazon Managed Streaming for Apache Kafka Developer Guide

"Revision": 1,

"ServerProperties":

"YXV0by5jcmVhdGUudG9waWNzLmVuYWJsZSA9IHRydWUKCgp6b29rZWVwZXIuY29ubmVjdGlvbi50aW1lb3V0Lm1zID0gMTAwMAoKCmxvZy5yb2xsLm1zID0gNjA0ODAwMDAw"

}

The value of ServerProperties is encoded with base64. If you use a base64 decoder (for

example, https://www.base64decode.org/) to decode it manually, you get the contents of the

original conﬁguration ﬁle that you used to create the custom conﬁguration. In this case, you

get the following:

auto.create.topics.enable = true

log.roll.ms = 604800000

List all Amazon MSK conﬁgurations in your account for the current

Region

This process describes how to list all Amazon MSK conﬁgurations in your account for the current

AWS Region.

• Run the following command.

aws kafka list-configurations

The following is an example of a successful response after you run this command.

{

"Configurations": [

{

"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/

abcdabcd-abcd-1234-abcd-abcd123e8e8e-1",

"CreationTime": "2019-05-21T00:54:23.591Z",

"Description": "Example configuration description.",

"KafkaVersions": [

"1.1.1"

"LatestRevision": {

"CreationTime": "2019-05-21T00:54:23.591Z",

"Description": "Example configuration description.",

List all Amazon MSK conﬁgurations in your account 101

Amazon Managed Streaming for Apache Kafka Developer Guide

"Revision": 1

"Name": "SomeTest"

{

"Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/SomeTest/

abcdabcd-1234-abcd-1234-abcd123e8e8e-1",

"CreationTime": "2019-05-03T23:08:29.446Z",

"Description": "Example configuration description.",

"KafkaVersions": [

"1.1.1"

"LatestRevision": {

"CreationTime": "2019-05-03T23:08:29.446Z",

"Description": "Example configuration description.",

"Revision": 1

"Name": "ExampleConfigurationName"

}

]

}

List all Amazon MSK conﬁgurations in your account 102

Amazon Managed Streaming for Apache Kafka Developer Guide

What is MSK Serverless?

Note

MSK Serverless is available in the US East (Ohio), US East (N. Virginia), US West (Oregon),

Canada (Central), Asia Paciﬁc (Mumbai), Asia Paciﬁc (Singapore), Asia Paciﬁc (Sydney),

Asia Paciﬁc (Tokyo), Asia Paciﬁc (Seoul), Europe (Frankfurt), Europe (Stockholm), Europe

(Ireland), Europe (Paris), and Europe (London) Regions.

MSK Serverless is a cluster type for Amazon MSK that makes it possible for you to run Apache

Kafka without having to manage and scale cluster capacity. It automatically provisions and scales

capacity while managing the partitions in your topic, so you can stream data without thinking

about right-sizing or scaling clusters. MSK Serverless oﬀers a throughput-based pricing model, so

you pay only for what you use. Consider using a serverless cluster if your applications need on-

demand streaming capacity that scales up and down automatically.

MSK Serverless is fully compatible with Apache Kafka, so you can use any compatible client

applications to produce and consume data. It also integrates with the following services:

• AWS PrivateLink to provide private connectivity

• AWS Identity and Access Management (IAM) for authentication and authorization using Java and

non-Java languages. For instruction on conﬁguring clients for IAM, see Conﬁgure clients for IAM

access control.

• AWS Glue Schema Registry for schema management

• Amazon Managed Service for Apache Flink for Apache Flink-based stream processing

• AWS Lambda for event processing

Note

MSK Serverless requires IAM access control for all clusters. Apache Kafka access control lists

(ACLs) are not supported. For more information, see the section called “IAM access control”.

For information about the service quota that apply to MSK Serverless, see the section

called “Quota for serverless clusters”.

103

Amazon Managed Streaming for Apache Kafka Developer Guide

To help you get started with serverless clusters, and to learn more about conﬁguration and

monitoring options for serverless clusters, see the following.

Topics

• Use MSK Serverless clusters

• Conﬁguration properties for MSK Serverless clusters

• Monitor MSK Serverless clusters

Use MSK Serverless clusters

This tutorial shows you an example of how you can create an MSK Serverless cluster, create a client

machine that can access it, and use the client to create topics on the cluster and to write data to

those topics. This exercise doesn't represent all the options that you can choose when you create

a serverless cluster. In diﬀerent parts of this exercise, we choose default options for simplicity.

This doesn't mean that they're the only options that work for setting up a serverless cluster. You

can also use the AWS CLI or the Amazon MSK API. For more information, see the Amazon MSK API

Reference 2.0.

Topics

• Create an MSK Serverless cluster

• Create an IAM role for topics on MSK Serverless cluster

• Create a client machine to access MSK Serverless cluster

• Create an Apache Kafka topic

• Produce and consume data in MSK Serverless

• Delete resources that you created for MSK Serverless

Create an MSK Serverless cluster

In this step, you perform two tasks. First, you create an MSK Serverless cluster with default

settings. Second, you gather information about the cluster. This is information that you need in

later steps when you create a client that can send data to the cluster.

Use MSK Serverless clusters 104

Amazon Managed Streaming for Apache Kafka Developer Guide

To create a serverless cluster

1. Sign in to the AWS Management Console, and open the Amazon MSK console at https://

console.aws.amazon.com/msk/home.

2. Choose Create cluster.

3. For Creation method, leave the Quick create option selected. The Quick create option lets

you create a serverless cluster with default settings.

For Cluster name, enter a descriptive name, such as msk-serverless-tutorial-cluster.

5. For General cluster properties, choose Serverless as the Cluster type. Use the default values

for the remaining General cluster properties.

6. Note the table under All cluster settings. This table lists the default values for important

settings such as networking and availability, and indicates whether you can change each

setting after you create the cluster. To change a setting before you create the cluster, you

should choose the Custom create option under Creation method.

Note

You can connect clients from up to ﬁve diﬀerent VPCs with MSK Serverless clusters.

To help client applications switch over to another Availability Zone in the event of an

outage, you must specify at least two subnets in each VPC.

7. Choose Create cluster.

To gather information about the cluster

1. In the Cluster summary section, choose View client information. This button remains grayed

out until Amazon MSK ﬁnishes creating the cluster. You might need to wait a few minutes until

the button becomes active so you can use it.

2. Copy the string under the label Endpoint. This is your bootstrap server string.

3. Choose the Properties tab.

4. Under the Networking settings section, copy the IDs of the subnets and the security group

and save them because you need this information later to create a client machine.

5. Choose any of the subnets. This opens the Amazon VPC Console. Find the ID of the Amazon

VPC that is associated with the subnet. Save this Amazon VPC ID for later use.

Create a cluster 105

Amazon Managed Streaming for Apache Kafka Developer Guide

Next Step

Create an IAM role for topics on MSK Serverless cluster

In this step, you perform two tasks. The ﬁrst task is to create an IAM policy that grants access to

create topics on the cluster and to send data to those topics. The second task is to create an IAM

role and associate this policy with it. In a later step, we create a client machine that assumes this

role and uses it to create a topic on the cluster and to send data to that topic.

To create an IAM policy that makes it possible to create topics and write to them

1. Open the IAM console at https://console.aws.amazon.com/iam/.

2. On the navigation pane, choose Policies.

3. Choose Create Policy.

4. Choose the JSON tab, then replace the JSON in the editor window with the following JSON.

Replace region with the code of the AWS Region where you created your cluster. Replace

Account-ID with your account ID. Replace msk-serverless-tutorial-cluster with the

name of your serverless cluster.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"kafka-cluster:Connect",

"kafka-cluster:AlterCluster",

"kafka-cluster:DescribeCluster"

"Resource": [

"arn:aws:kafka:region:Account-ID:cluster/msk-serverless-tutorial-

cluster/*"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:*Topic*",

Create an IAM role for topics on MSK Serverless cluster 106

Amazon Managed Streaming for Apache Kafka Developer Guide

"kafka-cluster:WriteData",

"kafka-cluster:ReadData"

"Resource": [

"arn:aws:kafka:region:Account-ID:topic/msk-serverless-tutorial-

cluster/*"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:AlterGroup",

"kafka-cluster:DescribeGroup"

"Resource": [

"arn:aws:kafka:region:Account-ID:group/msk-serverless-tutorial-

cluster/*"

]

}

]

}

For instructions on how to write secure policies, see the section called “IAM access control”.

5. Choose Next: Tags.

6. Choose Next: Review.

For the policy name, enter a descriptive name, such as msk-serverless-tutorial-policy.

8. Choose Create policy.

To create an IAM role and attach the policy to it

1. On the navigation pane, choose Roles.

2. Choose Create role.

3. Under Common use cases, choose EC2, then choose Next: Permissions.

4. In the search box, enter the name of the policy that you previously created for this tutorial.

Then select the box to the left of the policy.

5. Choose Next: Tags.

6. Choose Next: Review.

For the role name, enter a descriptive name, such as msk-serverless-tutorial-role.

Create an IAM role for topics on MSK Serverless cluster 107

Amazon Managed Streaming for Apache Kafka Developer Guide

8. Choose Create role.

Next Step

Create a client machine to access MSK Serverless cluster

In the step, you perform two tasks. The ﬁrst task is to create an Amazon EC2 instance to use as

an Apache Kafka client machine. The second task is to install Java and Apache Kafka tools on the

machine.

To create a client machine

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. Choose Launch instance.

Enter a descriptive Name for your client machine, such as msk-serverless-tutorial-

client.

4. Leave the Amazon Linux 2 AMI (HVM) - Kernel 5.10, SSD Volume Type selected for Amazon

Machine Image (AMI) type.

5. Leave the t2.micro instance type selected.

Under Key pair (login), choose Create a new key pair. Enter MSKServerlessKeyPair for

Key pair name. Then choose Download Key Pair. Alternatively, you can use an existing key

pair.

7. For Network settings, choose Edit.

8. Under VPC, enter the ID of the virtual private cloud (VPC) for your serverless cluster . This is

the VPC based on the Amazon VPC service whose ID you saved after you created the cluster.

9. For Subnet, choose the subnet whose ID you saved after you created the cluster.

10. For Firewall (security groups), select the security group associated with the cluster. This value

works if that security group has an inbound rule that allows traﬃc from the security group to

itself. With such a rule, members of the same security group can communicate with each other.

For more information, see Security group rules in the Amazon VPC Developer Guide.

11. Expand the Advanced details section and choose the IAM role that you created in Create an

IAM role for topics on MSK Serverless cluster.

12. Choose Launch.

Create a client machine 108

Amazon Managed Streaming for Apache Kafka Developer Guide

13. In the left navigation pane, choose Instances. Then choose the check box in the row that

represents your newly created Amazon EC2 instance. From this point forward, we call this

instance the client machine.

14. Choose Connect and follow the instructions to connect to the client machine.

To set up Apache Kafka client tools on the client machine

1. To install Java, run the following command on the client machine:

sudo yum -y install java-11

2. To get the Apache Kafka tools that we need to create topics and send data, run the following

commands:

wget https://archive.apache.org/dist/kafka/2.8.1/kafka_2.12-2.8.1.tgz

tar -xzf kafka_2.12-2.8.1.tgz

Go to the kafka_2.12-2.8.1/libs directory, then run the following command to download

the Amazon MSK IAM JAR ﬁle. The Amazon MSK IAM JAR makes it possible for the client

machine to access the cluster.

wget https://github.com/aws/aws-msk-iam-auth/releases/download/v1.1.1/aws-msk-iam-

auth-1.1.1-all.jar

Go to the kafka_2.12-2.8.1/bin directory. Copy the following property settings and paste

them into a new ﬁle. Name the ﬁle client.properties and save it.

security.protocol=SASL_SSL

sasl.mechanism=AWS_MSK_IAM

sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;

sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler

Next Step

Create an Apache Kafka topic

Create a client machine 109

Amazon Managed Streaming for Apache Kafka Developer Guide

Create an Apache Kafka topic

In this step, you use the previously created client machine to create a topic on the serverless

cluster.

To create a topic and write data to it

In the following export command, replace my-endpoint with the bootstrap-server string

you that you saved after you created the cluster. Then, go to the kafka_2.12-2.8.1/bin

directory on the client machine and run the export command.

export BS=my-endpoint

Run the following command to create a topic called msk-serverless-tutorial.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --bootstrap-server $BS

--command-config client.properties --create --topic msk-serverless-tutorial --

partitions 6

Next Step

Produce and consume data in MSK Serverless

In this step, you produce and consume data using the topic that you created in the previous step.

To produce and consume messages

1. Run the following command to create a console producer.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-list $BS

--producer.config client.properties --topic msk-serverless-tutorial

2. Enter any message that you want, and press Enter. Repeat this step two or three times. Every

time you enter a line and press Enter, that line is sent to your cluster as a separate message.

3. Keep the connection to the client machine open, and then open a second, separate connection

to that machine in a new window.

Create a topic 110

Amazon Managed Streaming for Apache Kafka Developer Guide

4. Use your second connection to the client machine to create a console consumer with the

following command. Replace my-endpoint with the bootstrap server string that you saved

after you created the cluster.

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-

server my-endpoint --consumer.config client.properties --topic msk-serverless-

tutorial --from-beginning

You start seeing the messages you entered earlier when you used the console producer

command.

5. Enter more messages in the producer window, and watch them appear in the consumer

window.

Next Step

Delete resources that you created for MSK Serverless

In this step, you delete the resources that you created in this tutorial.

To delete the cluster

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/home.

2. In the list of clusters, choose the cluster that you created for this tutorial.

3. For Actions, choose Delete cluster.

Enter delete in the ﬁeld, then choose Delete.

To stop the client machine

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. In the list of Amazon EC2 instances, choose the client machine that you created for this

tutorial.

3. Choose Instance state, then choose Terminate instance.

4. Choose Terminate.

Delete resources 111

Amazon Managed Streaming for Apache Kafka Developer Guide

To delete the IAM policy and role

1. Open the IAM console at https://console.aws.amazon.com/iam/.

2. On the navigation pane, choose Roles.

3. In the search box, enter the name of the IAM role that you created for this tutorial.

4. Choose the role. Then choose Delete role, and conﬁrm the deletion.

5. On the navigation pane, choose Policies.

6. In the search box, enter the name of the policy that you created for this tutorial.

7. Choose the policy to open its summary page. On the policy's Summary page, choose Delete

policy.

8. Choose Delete.

Conﬁguration properties for MSK Serverless clusters

Amazon MSK sets broker conﬁguration properties for serverless clusters. You can't change these

broker conﬁguration property settings. However, you can set or modify the following topic-level

conﬁguration properties. All other topic-level conﬁguration properties are not conﬁgurable.

Conﬁguration

property

Default Editable Maximum allowed

value

cleanup.policy Delete Yes, but only at topic

creation time



compression.type Producer Yes 

max.message.bytes 1048588 Yes 8388608 (8MiB)

message.timestamp.

diﬀerence.max.ms

long.max Yes 

message.timestamp.

type

CreateTime Yes 

Conﬁguration 112

Amazon Managed Streaming for Apache Kafka Developer Guide

Conﬁguration

property

Default Editable Maximum allowed

value

retention.bytes 250 GiB Yes Unlimited; set it

to -1 for unlimited

retention

retention.ms 7 days Yes Unlimited; set it

to -1 for unlimited

retention

To set or modify these topic-level conﬁguration properties, you can use Apache Kafka command

line tools. See 3.2 Topic-level Conﬁgs in the oﬃcial Apache Kafka documentation for more

information and examples of how to set them.

When using the Apache Kafka command line tools with Amazon MSK Serverless, make sure you

completed steps 1-4 in the To set up Apache Kafka client tools on the client machine section of the

Amazon MSK Serverless Getting Started documentation. Additionally, you must include the --

command-config client.properties parameter in your commands.

For example, the following command can be used to modify the retention.bytes topic

conﬁguration property to set unlimited retention:

<path-to-your-kafka-client-installation>/bin/kafka-configs.sh —bootstrap-

server <bootstrap_server_string> —command-config client.properties --entity-type topics

--entity-name <topic_name> --alter --add-config retention.bytes=-1

In this example, replace <bootstrap server string> with the bootstrap server endpoint for

your Amazon MSK Serverless cluster, and <topic_name> with the name of the topic you want to

modify.

The --command-config client.properties parameter ensures that the Kafka command line

tool uses the appropriate conﬁguration settings to communicate with your Amazon MSK Serverless

cluster.

Conﬁguration 113

Amazon Managed Streaming for Apache Kafka Developer Guide

Monitor MSK Serverless clusters

Amazon MSK integrates with Amazon CloudWatch so that you can collect, view, and analyze

metrics for your MSK Serverless cluster. The metrics shown in the following table are available for

all serverless clusters. As these metrics are published as individual data points for each partition in

the topic, we recommend viewing them as a 'SUM' statistic to get the topic-level view.

Amazon MSK publishes PerSec metrics to CloudWatch at a frequency of once per minute. This

means that the 'SUM' statistic for a one-minute period accurately represents per-second data

for PerSec metrics. To collect per-second data for a period of longer than one minute, use the

following CloudWatch math expression: m1 * 60/PERIOD(m1).

Metrics available at the DEFAULT monitoring level

Name When visible Dimensions Description

BytesInPerSec

After a

producer writes

to a topic

Cluster

Name, Topic

The number of bytes per second

received from clients. This metric is

available for each topic.

BytesOutPerSec

After a

consumer

group

consumes from

a topic

Cluster

Name, Topic

The number of bytes per second

sent to clients. This metric is

available for each topic.

FetchMess

ageConver

sionsPerSec

After a

consumer

group

consumes from

a topic

Cluster

Name, Topic

The number of fetch message

conversions per second for the topic.

Estimated

MaxTimeLag

After a

consumer

group

consumes from

a topic

Cluster

Name,

Consumer

Group, Topic

A time estimate of the MaxOﬀsetLag

metric.

Monitoring 114

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Dimensions Description

MaxOffsetLag

After a

consumer

group

consumes from

a topic

Cluster

Name,

Consumer

Group, Topic

The maximum oﬀset lag across all

partitions in a topic.

MessagesI

nPerSec

After a

producer writes

to a topic

Cluster

Name, Topic

The number of incoming messages

per second for the topic.

ProduceMe

ssageConv

ersionsPerSec

After a

producer writes

to a topic

Cluster

Name, Topic

The number of produce message

conversions per second for the topic.

SumOffsetLag

After a

consumer

group

consumes from

a topic

Cluster

Name,

Consumer

Group, Topic

The aggregated oﬀset lag for all the

partitions in a topic.

To view MSK Serverless metrics

1. Sign in to the AWS Management Console and open the CloudWatch console at https://

console.aws.amazon.com/cloudwatch/.

2. In the navigation pane, under Metrics, choose All metrics.

In the metrics search for the term kafka.

4. Choose AWS/Kafka / Cluster Name, Topic or AWS/Kafka / Cluster Name, Consumer Group,

Topic to see diﬀerent metrics.

Monitoring 115

Amazon Managed Streaming for Apache Kafka Developer Guide

Understand MSK Connect

MSK Connect is a feature of Amazon MSK that makes it easy for developers to stream data to

and from their Apache Kafka clusters. MSK Connect uses Kafka Connect 2.7.1, an open-source

framework for connecting Apache Kafka clusters with external systems such as databases, search

indexes, and ﬁle systems. With MSK Connect, you can deploy fully managed connectors built for

Kafka Connect that move data into or pull data from popular data stores like Amazon S3 and

Amazon OpenSearch Service. You can deploy connectors developed by 3rd parties like Debezium

for streaming change logs from databases into an Apache Kafka cluster, or deploy an existing

connector with no code changes. Connectors automatically scale to adjust for changes in load and

you pay only for the resources that you use.

Use source connectors to import data from external systems into your topics. With sink connectors,

you can export data from your topics to external systems.

MSK Connect supports connectors for any Apache Kafka cluster with connectivity to an Amazon

VPC, whether it is an MSK cluster or an independently hosted Apache Kafka cluster.

MSK Connect continuously monitors connector health and delivery state, patches and manages the

underlying hardware, and autoscales the connectors to match changes in throughput.

To get started using MSK Connect, see the section called “Getting started”.

To learn about the AWS resources that you can create with MSK Connect, see the section called

“Understand connectors”, the section called “Create custom plugins”, and the section called

“Understand MSK Connect workers”.

For information about the MSK Connect API, see the Amazon MSK Connect API Reference.

Beneﬁts of using Amazon MSK Connect

Apache Kafka is one of the most widely adopted open source streaming platforms for ingesting

and processing real-time data streams. With Apache Kafka, you can decouple and independently

scale your data-producing and data-consuming applications.

Kafka Connect is an important component of building and running streaming applications with

Apache Kafka. Kafka Connect provides a standardized way of moving data between Kafka and

Amazon MSK Connect beneﬁts 116

Amazon Managed Streaming for Apache Kafka Developer Guide

external systems. Kafka Connect is highly scalable and can handle large volumes of data Kafka

Connect provides a powerful set of API operations and tools for conﬁguring, deploying, and

monitoring connectors that move data between Kafka topics and external systems. You can use

these tools to customize and extend the functionality of Kafka Connect to meet the speciﬁc needs

of your streaming application.

You might encounter challenges when you’re operating Apache Kafka Connect clusters on their

own, or when you’re trying to migrate open source Apache Kafka Connect applications to AWS.

These challenges include time required to setup infrastructure and deploying applications,

engineering obstacles when setting up self-managed Apache Kafka Connect clusters, and

administrative operational overhead.

To address these challenges, we recommend using Amazon Managed Streaming for Apache Kafka

Connect (Amazon MSK Connect) to migrate your open source Apache Kafka Connect applications

to AWS. Amazon MSK Connect simpliﬁes using Kafka Connect to stream data to and from between

Apache Kafka clusters and external systems, such as databases, search indexes, and ﬁle systems.

Here are some of the beneﬁts to migrating to Amazon MSK Connect:

• Elimination of operational overhead — Amazon MSK Connect takes away the operational

burden associated with patching, provisioning, and scaling of Apache Kafka Connect clusters.

Amazon MSK Connect continuously monitors the health of your Connect clusters and automates

patching and version upgrades without causing any disruptions to your workloads.

• Automatic restarting of Connect tasks — Amazon MSK Connect can automatically recover

failed tasks to reduce production disruptions. Task failures can be caused by temporary errors,

such as breaching the TCP connection limit for Kafka, and task rebalancing when new workers

join the consumer group for sink connectors.

• Automatic horizontal and vertical scaling — Amazon MSK Connect enables the connector

application to automatically scale to support higher throughputs. Amazon MSK Connect

manages scaling for you. You only need to specifying the number of workers in the auto scaling

group and the utilization thresholds. You can use the Amazon MSK Connect UpdateConnector

API operation to vertically scale up or scale down the vCPUs between 1 and 8 vCPUs for

supporting variable throughput.

• Private network connectivity — Amazon MSK Connect privately connects to source and sink

systems by using AWS PrivateLink and private DNS names.

Amazon MSK Connect beneﬁts 117

Amazon Managed Streaming for Apache Kafka Developer Guide

Getting started with MSK Connect

This is a step-by-step tutorial that uses the AWS Management Console to create an MSK cluster

and a sink connector that sends data from the cluster to an S3 bucket.

Topics

• Set up resources required for MSK Connect

• Create custom plugin

• Create client machine and Apache Kafka topic

• Create connector

• Send data to the MSK cluster

Set up resources required for MSK Connect

In this step you create the following resources that you need for this getting-started scenario:

• An S3 bucket to serve as the destination that receives data from the connector.

• An MSK cluster to which you will send data. The connector will then read the data from this

cluster and send it to the destination S3 bucket.

• An IAM role that allows the connector to write to the destination S3 bucket.

• An Amazon VPC endpoint to make it possible to send data from the Amazon VPC that has the

cluster and the connector to Amazon S3.

To create the S3 bucket

1. Sign in to the AWS Management Console and open the Amazon S3 console at https://

console.aws.amazon.com/s3/.

2. Choose Create bucket.

For the name of the bucket, enter a descriptive name such as mkc-tutorial-destination-

bucket.

4. Scroll down and choose Create bucket.

5. In the list of buckets, choose the newly created bucket.

6. Choose Create folder.

Getting started 118

Amazon Managed Streaming for Apache Kafka Developer Guide

Enter tutorial for the name of the folder, then scroll down and choose Create folder.

To create the cluster

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/home?region=us-

east-1#/home/.

2. In the left pane, under MSK Clusters, choose Clusters.

3. Choose Create cluster.

4. Choose Custom create.

For the cluster name enter mkc-tutorial-cluster.

6. Under General cluster properties, choose Provisioned for the cluster type.

7. Under Networking, choose an Amazon VPC. Then select the Availability Zones and subnets

that you want to use. Remember the IDs of the Amazon VPC and subnets that you selected

because you need them later in this tutorial.

8. Under Access control methods ensure that only Unauthenticated access is selected.

9. Under Encryption ensure that only Plaintext is selected.

10. Continue through the wizard and then choose Create cluster. This takes you to the details

page for the cluster. On that page, under Security groups applied, ﬁnd the security group ID.

Remember that ID because you need it later in this tutorial.

To create the IAM role that can write to the destination bucket

1. Open the IAM console at https://console.aws.amazon.com/iam/.

2. In the left pane, under Access management, choose Roles.

3. Choose Create role.

4. Under Or select a service to view its use cases, choose S3.

5. Scroll down and under Select your use case, again choose S3.

6. Choose Next: Permissions.

7. Choose Create policy. This opens a new tab in your browser where you will create the policy.

Leave the original role-creation tab open because we'll get back to it later.

8. Choose the JSON tab, and then replace the text in the window with the following policy.

{

Set up resources required for MSK Connect 119

Amazon Managed Streaming for Apache Kafka Developer Guide

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"s3:ListAllMyBuckets"

"Resource": "arn:aws:s3:::*"

{

"Effect": "Allow",

"Action": [

"s3:ListBucket",

"s3:GetBucketLocation",

"s3:DeleteObject"

"Resource": "arn:aws:s3:::<my-tutorial-destination-bucket>"

{

"Effect": "Allow",

"Action": [

"s3:PutObject",

"s3:GetObject",

"s3:AbortMultipartUpload",

"s3:ListMultipartUploadParts",

"s3:ListBucketMultipartUploads"

"Resource": "*"

}

]

}

9. Choose Next: Tags.

10. Choose Next: Review.

11.

Enter mkc-tutorial-policy for the policy name, then scroll down and choose Create

policy.

12. Back in the browser tab where you were creating the role, choose the refresh button.

13.

Find the mkc-tutorial-policy and select it by choosing the button to its left.

14. Choose Next: Tags.

15. Choose Next: Review.

16.

Enter mkc-tutorial-role for the role name, and delete the text in the description box.

Set up resources required for MSK Connect 120

Amazon Managed Streaming for Apache Kafka Developer Guide

17. Choose Create role.

To allow MSK Connect to assume the role

1. In the IAM console, in the left pane, under Access management, choose Roles.

Find the mkc-tutorial-role and choose it.

3. Under the role's Summary, choose the Trust relationships tab.

4. Choose Edit trust relationship.

5. Replace the existing trust policy with the following JSON.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": "kafkaconnect.amazonaws.com"

"Action": "sts:AssumeRole"

}

]

}

6. Choose Update Trust Policy.

To create an Amazon VPC endpoint from the cluster's VPC to Amazon S3

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

2. In the left pane, choose Endpoints.

3. Choose Create endpoint.

4. Under Service Name choose the com.amazonaws.us-east-1.s3 service and the Gateway type.

5. Choose the cluster's VPC and then select the box to the left of the route table that is

associated with the cluster's subnets.

6. Choose Create endpoint.

Next Step

Set up resources required for MSK Connect 121

Amazon Managed Streaming for Apache Kafka Developer Guide

Create custom plugin

A plugin contains the code that deﬁnes the logic of the connector. In this step you create a custom

plugin that has the code for the Lenses Amazon S3 Sink Connector. In a later step, when you create

the MSK connector, you specify that its code is in this custom plugin. You can use the same plugin

to create multiple MSK connectors with diﬀerent conﬁgurations.

To create the custom plugin

1. Download the S3 connector.

2. Upload the ZIP ﬁle to an S3 bucket to which you have access. For information on how to

upload ﬁles to Amazon S3, see Uploading objects in the Amazon S3 user guide.

3. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

4. In the left pane expand MSK Connect, then choose Custom plugins.

5. Choose Create custom plugin.

6. Choose Browse S3.

7. In the list of buckets ﬁnd the bucket where you uploaded the ZIP ﬁle, and choose that bucket.

8. In the list of objects in the bucket, select the radio button to the left of the ZIP ﬁle, then

choose the button labeled Choose.

Enter mkc-tutorial-plugin for the custom plugin name, then choose Create custom

plugin.

It might take AWS a few minutes to ﬁnish creating the custom plugin. When the creation process is

complete, you see the following message in a banner at the top of the browser window.

Custom plugin mkc-tutorial-plugin was successfully created

The custom plugin was created. You can now create a connector using this custom plugin.

Next Step

Create client machine and Apache Kafka topic

Create custom plugin 122

Amazon Managed Streaming for Apache Kafka Developer Guide

Create client machine and Apache Kafka topic

In this step you create an Amazon EC2 instance to use as an Apache Kafka client instance. You then

use this instance to create a topic on the cluster.

To create a client machine

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

2. Choose Launch instances.

Enter a Name for your client machine, such as mkc-tutorial-client.

4. Leave Amazon Linux 2 AMI (HVM) - Kernel 5.10, SSD Volume Type selected for Amazon

Machine Image (AMI) type.

5. Choose the t2.xlarge instance type.

Under Key pair (login), choose Create a new key pair. Enter mkc-tutorial-key-pair for

Key pair name, and then choose Download Key Pair. Alternatively, you can use an existing key

pair.

7. Choose Launch instance.

8. Choose View Instances. Then, in the Security Groups column, choose the security group that

is associated with your new instance. Copy the ID of the security group, and save it for later.

To allow the newly created client to send data to the cluster

1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

2. In the left pane, under SECURITY, choose Security Groups. In the Security group ID column,

ﬁnd the security group of the cluster. You saved the ID of this security group when you created

the cluster in the section called “Set up resources required for MSK Connect”. Choose this

security group by selecting the box to the left of its row. Make sure no other security groups

are simultaneously selected.

3. In the bottom half of the screen, choose the Inbound rules tab.

4. Choose Edit inbound rules.

5. In the bottom left of the screen, choose Add rule.

6. In the new rule, choose All traﬃc in the Type column. In the ﬁeld to the right of the Source

column, enter the ID of the security group of the client machine. This is the security group ID

that you saved after you created the client machine.

Create client machine and Apache Kafka topic 123

Amazon Managed Streaming for Apache Kafka Developer Guide

7. Choose Save rules. Your MSK cluster will now accept all traﬃc from the client you created in

the previous procedure.

To create a topic

1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

In the table of instances choose mkc-tutorial-client.

3. Near the top of the screen, choose Connect, then follow the instructions to connect to the

instance.

4. Install Java on the client instance by running the following command:

sudo yum install java-1.8.0

5. Run the following command to download Apache Kafka.

wget https://archive.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz

Note

If you want to use a mirror site other than the one used in this command, you can

choose a diﬀerent one on the Apache website.

6. Run the following command in the directory where you downloaded the TAR ﬁle in the

previous step.

tar -xzf kafka_2.12-2.2.1.tgz

7. Go to the kafka_2.12-2.2.1 directory.

8. Open the Amazon MSK console at https://console.aws.amazon.com/msk/home?region=us-

east-1#/home/.

In the left pane choose Clusters, then choose the name mkc-tutorial-cluster.

10. Choose View client information.

11. Copy the Plaintext connection string.

12. Choose Done.

Create client machine and Apache Kafka topic 124

Amazon Managed Streaming for Apache Kafka Developer Guide

13.

Run the following command on the client instance (mkc-tutorial-client), replacing

bootstrapServerString with the value that you saved when you viewed the cluster's client

information.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --bootstrap-

server bootstrapServerString --replication-factor 2 --partitions 1 --topic mkc-

tutorial-topic

If the command succeeds, you see the following message: Created topic mkc-tutorial-

topic.

Next Step

Create connector

To create the connector

1. Sign in to the AWS Management Console, and open the Amazon MSK console at https://

console.aws.amazon.com/msk/home?region=us-east-1#/home/.

2. In the left pane, expand MSK Connect, then choose Connectors.

3. Choose Create connector.

In the list of plugins, choose mkc-tutorial-plugin, then choose Next.

For the connector name enter mkc-tutorial-connector.

In the list of clusters, choose mkc-tutorial-cluster.

7. Copy the following conﬁguration and paste it into the connector conﬁguration ﬁeld.

connector.class=io.confluent.connect.s3.S3SinkConnector

s3.region=us-east-1

format.class=io.confluent.connect.s3.format.json.JsonFormat

flush.size=1

schema.compatibility=NONE

tasks.max=2

topics=mkc-tutorial-topic

partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner

storage.class=io.confluent.connect.s3.storage.S3Storage

s3.bucket.name=<my-tutorial-destination-bucket>

Create connector 125

Amazon Managed Streaming for Apache Kafka Developer Guide

topics.dir=tutorial

Under Access permissions choose mkc-tutorial-role.

9. Choose Next. On the Security page, choose Next again.

10. On the Logs page choose Next.

11. Under Review and create choose Create connector.

Next Step

Send data to the MSK cluster

In this step you send data to the Apache Kafka topic that you created earlier, and then look for that

same data in the destination S3 bucket.

To send data to the MSK cluster

In the bin folder of the Apache Kafka installation on the client instance, create a text ﬁle

named client.properties with the following contents.

security.protocol=PLAINTEXT

Run the following command to create a console producer. Replace BootstrapBrokerString

with the value that you obtained when you ran the previous command.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-

list BootstrapBrokerString --producer.config client.properties --topic mkc-

tutorial-topic

3. Enter any message that you want, and press Enter. Repeat this step two or three times. Every

time you enter a line and press Enter, that line is sent to your Apache Kafka cluster as a

separate message.

4. Look in the destination Amazon S3 bucket to ﬁnd the messages that you sent in the previous

step.

Send data to the MSK cluster 126

Amazon Managed Streaming for Apache Kafka Developer Guide

Understand connectors

A connector integrates external systems and Amazon services with Apache Kafka by continuously

copying streaming data from a data source into your Apache Kafka cluster, or continuously copying

data from your cluster into a data sink. A connector can also perform lightweight logic such as

transformation, format conversion, or ﬁltering data before delivering the data to a destination.

Source connectors pull data from a data source and push this data into the cluster, while sink

connectors pull data from the cluster and push this data into a data sink.

The following diagram shows the architecture of a connector. A worker is a Java virtual machine

(JVM) process that runs the connector logic. Each worker creates a set of tasks that run in parallel

threads and do the work of copying the data. Tasks don't store state, and can therefore be started,

stopped, or restarted at any time in order to provide a resilient and scalable data pipeline.

Understand connector capacity

The total capacity of a connector depends on the number of workers that the connector has, as

well as on the number of MSK Connect Units (MCUs) per worker. Each MCU represents 1 vCPU

of compute and 4 GiB of memory. The MCU memory pertains to the total memory of a worker

instance and not the heap memory in use.

Understand connectors 127

Amazon Managed Streaming for Apache Kafka Developer Guide

MSK Connect workers consume IP addresses in the customer-provided subnets. Each worker uses

one IP address from one of the customer-provided subnets. You should ensure that you have

enough available IP addresses in the subnets provided to a CreateConnector request to account for

their speciﬁed capacity, especially when autoscaling connectors where the number of workers can

ﬂuctuate.

To create a connector, you must choose between one of the following two capacity modes.

• Provisioned - Choose this mode if you know the capacity requirements for your connector. You

specify two values:

• The number of workers.

• The number of MCUs per worker.

• Autoscaled - Choose this mode if the capacity requirements for your connector are variable

or if you don't know them in advance. When you use autoscaled mode, Amazon MSK Connect

overrides your connector's tasks.max property with a value that is proportional to the number

of workers running in the connector and the number of MCUs per worker.

You specify three sets of values:

• The minimum and maximum number of workers.

• The scale-in and scale-out percentages for CPU utilization, which is determined by the

CpuUtilization metric. When the CpuUtilization metric for the connector exceeds

the scale-out percentage, MSK Connect increases the number of workers that are running in

the connector. When the CpuUtilization metric goes below the scale-in percentage, MSK

Connect decreases the number of workers. The number of workers always remains within the

minimum and maximum numbers that you specify when you create the connector.

• The number of MCUs per worker.

For more information about workers, see the section called “Understand MSK Connect workers”. To

learn about MSK Connect metrics, see the section called “Monitoring”.

Create a connector

Creating a connector using the AWS Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. In the left pane, under MSK Connect, choose Connectors.

Create a connector 128

Amazon Managed Streaming for Apache Kafka Developer Guide

3. Choose Create connector.

4. You can choose between using an existing custom plugin to create the connector, or creating

a new custom plugin ﬁrst. For information on custom plugins and how to create them, see

the section called “Create custom plugins”. In this procedure, let's assume you have a custom

plugin that you want to use. In the list of custom plugins, ﬁnd the one that you want to use,

and select the box to its left, then choose Next.

5. Enter a name and, optionally, a description.

6. Choose the cluster that you want to connect to.

7. Specify the connector conﬁguration. The conﬁguration parameters that you need to specify

depend on the type of connector that you want to create. However, some parameters are

common to all connectors, for example, the connector.class and tasks.max parameters.

The following is an example conﬁguration for the Conﬂuent Amazon S3 Sink Connector.

connector.class=io.confluent.connect.s3.S3SinkConnector

tasks.max=2

topics=my-example-topic

s3.region=us-east-1

s3.bucket.name=my-destination-bucket

flush.size=1

storage.class=io.confluent.connect.s3.storage.S3Storage

format.class=io.confluent.connect.s3.format.json.JsonFormat

partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner

key.converter=org.apache.kafka.connect.storage.StringConverter

value.converter=org.apache.kafka.connect.storage.StringConverter

schema.compatibility=NONE

8. Next, you conﬁgure your connector capacity. You can choose between two capacity modes:

provisioned and auto scaled. For information about these two options, see the section called

“Understand connector capacity”.

9. Choose either the default worker conﬁguration or a custom worker conﬁguration. For

information about creating custom worker conﬁgurations, see the section called “Understand

MSK Connect workers”.

10. Next, you specify the service execution role. This must be an IAM role that MSK Connect can

assume, and that grants the connector all the permissions that it needs to access the necessary

AWS resources. Those permissions depend on the logic of the connector. For information

about how to create this role, see the section called “Understand service execution role”.

11. Choose Next, review the security information, then choose Next again.

Create a connector 129

Amazon Managed Streaming for Apache Kafka Developer Guide

12. Specify the logging options that you want, then choose Next. For information about logging,

see the section called “Logging”.

13. Choose Create connector.

To use the MSK Connect API to create a connector, see CreateConnector.

Connecting from connectors

The following best practices can improve the performance of your connectivity to Amazon MSK

Connect.

Do not overlap IPs for Amazon VPC peering or Transit Gateway

If you are using Amazon VPC peering or Transit Gateway with Amazon MSK Connect, do not

conﬁgure your connector for reaching the peered VPC resources with IPs in the CIDR ranges:

• "10.99.0.0/16"

• "192.168.0.0/16"

• "172.21.0.0/16"

Create custom plugins

A plugin is an AWS resource that contains the code that deﬁnes your connector logic. You upload a

JAR ﬁle (or a ZIP ﬁle that contains one or more JAR ﬁles) to an S3 bucket, and specify the location

of the bucket when you create the plugin. When you create a connector, you specify the plugin that

you want MSK Connect to use for it. The relationship of plugins to connectors is one-to-many: You

can create one or more connectors from the same plugin.

For information on how to develop the code for a connector, see the Connector Development

Guidein the Apache Kafka documentation.

Creating a custom plugin using the AWS Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. In the left pane, under MSK Connect, choose Custom plugins.

3. Choose Create custom plugin.

Connecting from connectors 130

Amazon Managed Streaming for Apache Kafka Developer Guide

4. Choose Browse S3.

5. In the list of S3 buckets, choose the bucket that has the JAR or ZIP ﬁle for the plugin.

6. In the list of object, select the box to the left of the JAR or ZIP ﬁle for the plugin, then choose

Choose.

7. Choose Create custom plugin.

To use the MSK Connect API to create a custom plugin, see CreateCustomPlugin.

Understand MSK Connect workers

A worker is a Java virtual machine (JVM) process that runs the connector logic. Each worker creates

a set of tasks that run in parallel threads and do the work of copying the data. Tasks don't store

state, and can therefore be started, stopped, or restarted at any time in order to provide a resilient

and scalable data pipeline. Changes to the number of workers, whether due to a scaling event or

due to unexpected failures, are automatically detected by the remaining workers. They coordinate

to rebalance tasks across the set of remaining workers. Connect workers use Apache Kafka's

consumer groups to coordinate and rebalance.

If your connector's capacity requirements are variable or diﬃcult to estimate, you can let MSK

Connect scale the number of workers as needed between a lower limit and an upper limit that

you specify. Alternatively, you can specify the exact number of workers that you want to run your

connector logic. For more information, see the section called “Understand connector capacity”.

MSK Connect workers consume IP addresses

MSK Connect workers consume IP addresses in the customer-provided subnets. Each worker uses

one IP address from one of the customer-provided subnets. You should ensure that you have

enough available IP addresses in the subnets provided to a CreateConnector request to account for

their speciﬁed capacity, especially when autoscaling connectors where the number of workers can

ﬂuctuate.

Default worker conﬁguration

MSK Connect provides the following default worker conﬁguration:

key.converter=org.apache.kafka.connect.storage.StringConverter

value.converter=org.apache.kafka.connect.storage.StringConverter

Understand MSK Connect workers 131

Amazon Managed Streaming for Apache Kafka Developer Guide

Supported worker conﬁguration properties

MSK Connect provides a default worker conﬁguration. You also have the option to create a custom

worker conﬁguration to use with your connectors. The following list includes information about the

worker conﬁguration properties that Amazon MSK Connect does or does not support.

•

The key.converter and value.converter properties are required.

•

MSK Connect supports the following producer. conﬁguration properties.

producer.acks

producer.batch.size

producer.buffer.memory

producer.compression.type

producer.enable.idempotence

producer.key.serializer

producer.linger.ms

producer.max.request.size

producer.metadata.max.age.ms

producer.metadata.max.idle.ms

producer.partitioner.class

producer.reconnect.backoff.max.ms

producer.reconnect.backoff.ms

producer.request.timeout.ms

producer.retry.backoff.ms

producer.value.serializer

•

MSK Connect supports the following consumer. conﬁguration properties.

consumer.allow.auto.create.topics

consumer.auto.offset.reset

consumer.check.crcs

consumer.fetch.max.bytes

consumer.fetch.max.wait.ms

consumer.fetch.min.bytes

consumer.heartbeat.interval.ms

consumer.key.deserializer

consumer.max.partition.fetch.bytes

consumer.max.poll.records

consumer.metadata.max.age.ms

consumer.partition.assignment.strategy

consumer.reconnect.backoff.max.ms

consumer.reconnect.backoff.ms

Supported worker conﬁguration properties 132

Amazon Managed Streaming for Apache Kafka Developer Guide

consumer.request.timeout.ms

consumer.retry.backoff.ms

consumer.session.timeout.ms

consumer.value.deserializer

•

All other conﬁguration properties that don't start with the producer. or consumer. preﬁxes

are supported except for the following properties.

access.control.

admin.

admin.listeners.https.

client.

connect.

inter.worker.

internal.

listeners.https.

metrics.

metrics.context.

rest.

sasl.

security.

socket.

ssl.

topic.tracking.

worker.

bootstrap.servers

config.storage.topic

connections.max.idle.ms

connector.client.config.override.policy

group.id

listeners

metric.reporters

plugin.path

receive.buffer.bytes

response.http.headers.config

scheduled.rebalance.max.delay.ms

send.buffer.bytes

status.storage.topic

For more information about worker conﬁguration properties and what they represent, see Kafka

Connect Conﬁgs in the Apache Kafka documentation.

Supported worker conﬁguration properties 133

Amazon Managed Streaming for Apache Kafka Developer Guide

Create a custom worker conﬁguration

Creating a custom worker conﬁguration using the AWS Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. In the left pane, under MSK Connect, choose Worker conﬁgurations.

3. Choose Create worker conﬁguration.

4. Enter a name and an optional description, then add the properties and values that you want to

set them to.

5. Choose Create worker conﬁguration.

To use the MSK Connect API to create a worker conﬁguration, see CreateWorkerConﬁguration.

Manage source connector oﬀsets using offset.storage.topic

This section provides information to help you manage source connector oﬀsets using the oﬀset

storage topic. The oﬀset storage topic is an internal topic that Kafka Connect uses to store

connector and task conﬁguration oﬀsets.

Considerations

Consider the following when you manage source connector oﬀsets.

• To specify an oﬀset storage topic, provide the name of the Kafka topic where connector oﬀsets

are stored as the value for offset.storage.topic in your worker conﬁguration.

• Use caution when you make changes to a connector conﬁguration. Changing conﬁguration

values may result in unintended connector behavior if a source connector uses values from

the conﬁguration to key oﬀset records. We recommend that you refer to your plugin's

documentation for guidance.

• Customize default number of partitions – In addition to customizing the worker conﬁguration

by adding offset.storage.topic, you can customize the number of partitions for the oﬀset

and status storage topics. Default partitions for internal topics are as follows.

•

config.storage.topic: 1, not conﬁgurable, must be single partition topic

•

offset.storage.topic: 25, conﬁgurable by providing offset.storage.partitions

•

status.storage.topic: 5, conﬁgurable by providing status.storage.partitions

Create a custom conﬁguration 134

Amazon Managed Streaming for Apache Kafka Developer Guide

• Manually deleting topics – Amazon MSK Connect creates new Kafka connect internal topics

(topic name starts with __amazon_msk_connect) on every deployment of connectors.

Old topics that are attached to deleted connectors are not automatically removed because

internal topics, such as offset.storage.topic, can be reused among connectors. However,

you can manually delete unused internal topics created by MSK Connect. The internal

topics are named following the format __amazon_msk_connect_<offsets|status|

configs>_connector_name_connector_id.

The regular expression __amazon_msk_connect_<offsets|status|

configs>_connector_name_connector_id can be used to delete the internal topics. You

should not delete an internal topic that is currently in use by a running connector.

• Using the same name for the internal topics created by MSK Connect – If you want to reuse

the oﬀset storage topic to consume oﬀsets from a previously created connector, you must

give the new connector the same name as the old connector. The offset.storage.topic

property can be set using the worker conﬁguration to assign the same name to the

offset.storage.topic and reused between diﬀerent connectors. This conﬁguration is

described in Managing connector oﬀsets. MSK Connect does not allow diﬀerent connectors to

share config.storage.topic and status.storage.topic. Those topics are created each

time you create a new connector in MSKC. They are automatically named following the format

__amazon_msk_connect_<status|configs>_connector_name_connector_id, and so

are diﬀerent across the diﬀerent connectors that you create.

Use the default oﬀset storage topic

By default, Amazon MSK Connect generates a new oﬀset storage topic on your Kafka

cluster for each connector that you create. MSK constructs the default topic name using

parts of the connector ARN. For example, __amazon_msk_connect_offsets_my-mskc-

connector_12345678-09e7-4abc-8be8-c657f7e4ff32-2.

Use custom oﬀset storage topic

To provide oﬀset continuity between source connectors, you can use an oﬀset storage topic of your

choice instead of the default topic. Specifying an oﬀset storage topic helps you accomplish tasks

like creating a source connector that resumes reading from the last oﬀset of a previous connector.

To specify an oﬀset storage topic, you supply a value for the offset.storage.topic property

in your worker conﬁguration before you create a connector. If you want to reuse the oﬀset storage

Manage connector oﬀsets 135

Amazon Managed Streaming for Apache Kafka Developer Guide

topic to consume oﬀsets from a previously created connector, you must give the new connector

the same name as the old connector. If you create a custom oﬀset storage topic, you must set

cleanup.policy to compact in your topic conﬁguration.

Note

If you specify an oﬀset storage topic when you create a sink connector, MSK Connect

creates the topic if it does not already exist. However, the topic will not be used to store

connector oﬀsets.

Sink connector oﬀsets are instead managed using the Kafka consumer group protocol.

Each sink connector creates a group named connect-{CONNECTOR_NAME}. As long as

the consumer group exists, any successive sink connectors that you create with the same

CONNECTOR_NAME value will continue from the last committed oﬀset.

Example : Specifying an oﬀset storage topic to recreate a source connector with an updated

conﬁguration

Suppose you have a change data capture (CDC) connector and you want to modify the connector

conﬁguration without losing your place in the CDC stream. You can't update the existing connector

conﬁguration, but you can delete the connector and create a new one with the same name.

To tell the new connector where to start reading in the CDC stream, you can specify the old

connector's oﬀset storage topic in your worker conﬁguration. The following steps demonstrate

how to accomplish this task.

1. On your client machine, run the following command to ﬁnd the name of your connector's

oﬀset storage topic. Replace <bootstrapBrokerString> with your cluster's bootstrap

broker string. For instructions on getting your bootstrap broker string, see Get the bootstrap

brokers for an Amazon MSK cluster.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --list --bootstrap-

server <bootstrapBrokerString>

The following output shows a list of all cluster topics, including any default internal

connector topics. In this example, the existing CDC connector uses the default oﬀset

storage topic created by MSK Connect. This is why the oﬀset storage topic is called

__amazon_msk_connect_offsets_my-mskc-connector_12345678-09e7-4abc-8be8-

c657f7e4ff32-2.

Manage connector oﬀsets 136

Amazon Managed Streaming for Apache Kafka Developer Guide

__consumer_offsets

__amazon_msk_canary

__amazon_msk_connect_configs_my-mskc-connector_12345678-09e7-4abc-8be8-

c657f7e4ff32-2

__amazon_msk_connect_offsets_my-mskc-connector_12345678-09e7-4abc-8be8-

c657f7e4ff32-2

__amazon_msk_connect_status_my-mskc-connector_12345678-09e7-4abc-8be8-

c657f7e4ff32-2

my-msk-topic-1

my-msk-topic-2

2. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

3. Choose your connector from the Connectors list. Copy and save the contents of the Connector

conﬁguration ﬁeld so that you can modify it and use it to create the new connector.

4. Choose Delete to delete the connector. Then enter the connector name in the text input ﬁeld

to conﬁrm deletion.

5. Create a custom worker conﬁguration with values that ﬁt your scenario. For instructions, see

Create a custom worker conﬁguration.

In your worker conﬁguration, you must specify the name of the oﬀset storage topic that

you previously retrieved as the value for offset.storage.topic like in the following

conﬁguration.

config.providers.secretManager.param.aws.region=us-east-1

key.converter=<org.apache.kafka.connect.storage.StringConverter>

value.converter=<org.apache.kafka.connect.storage.StringConverter>

config.providers.secretManager.class=com.github.jcustenborder.kafka.config.aws.SecretsManagerConfigProvider

config.providers=secretManager

offset.storage.topic=__amazon_msk_connect_offsets_my-mskc-

connector_12345678-09e7-4abc-8be8-c657f7e4ff32-2

Important

You must give your new connector the same name as the old connector.

Create a new connector using the worker conﬁguration that you set up in the previous step.

For instructions, see Create a connector.

Manage connector oﬀsets 137

Amazon Managed Streaming for Apache Kafka Developer Guide

Tutorial: Externalizing sensitive information using conﬁg

providers

This example shows how to externalize sensitive information for Amazon MSK Connect using an

open source conﬁguration provider. A conﬁguration providers lets you specify variables instead of

plaintext in a connector or worker conﬁguration, and workers running in your connector resolve

these variables at runtime. This prevents credentials and other secrets from being stored in

plaintext. The conﬁguration provider in the example supports retrieving conﬁguration parameters

from AWS Secrets Manager, Amazon S3 and Systems Manager (SSM). In Step 2, you can see how to

set up storage and retrieval of sensitive information for the service that you want to conﬁgure.

Considerations

Consider the following while using the MSK conﬁg provider with Amazon MSK Connect:

• Assign appropriate permissions when using the conﬁg providers to the IAM Service Execution

Role.

• Deﬁne the conﬁg providers in worker conﬁgurations and their implementation in the connector

conﬁguration.

• Sensitive conﬁguration values can appear in connector logs if a plugin does not deﬁne those

values as secret. Kafka Connect treats undeﬁned conﬁguration values the same as any other

plaintext value. To learn more, see Preventing secrets from appearing in connector logs.

• By default, MSK Connect frequently restarts a connector when the connector

uses a conﬁguration provider. To turn oﬀ this restart behavior, you can set the

config.action.reload value to none in your connector conﬁguration.

Create a custom plugin and upload to S3

To create a custom-plugin, create a zip ﬁle that contains the connector and the msk-conﬁg-

provider by running the following commands on your local machine.

To create a custom plugin using a terminal window and Debezium as the connector

Use the AWS CLI to run commands as a superuser with credentials that allow you to access your

AWS S3 bucket. For information on installing and setting up the AWS CLI, see Getting started with

the AWS CLI in the AWS Command Line Interface User Guide. For information on using the AWS CLI

Conﬁguration providers 138

Amazon Managed Streaming for Apache Kafka Developer Guide

with Amazon S3, see Using Amazon S3 with the AWS CLI in the AWS Command Line Interface User

Guide.

In a terminal window, create a folder named custom-plugin in your workspace using the

following command.

mkdir custom-plugin && cd custom-plugin

2. Download the latest stable release of the MySQL Connector Plug-in from the Debezium site

using the following command.

wget https://repo1.maven.org/maven2/io/debezium/debezium-connectormysql/

2.2.0.Final/debezium-connector-mysql-2.2.0.Final-plugin.tar.gz

Extract the downloaded gzip ﬁle in the custom-plugin folder using the following command.

tar xzf debezium-connector-mysql-2.2.0.Final-plugin.tar.gz

3. Download the MSK conﬁg provider zip ﬁle using the following command.

wget https://github.com/aws-samples/msk-config-providers/releases/download/r0.1.0/

msk-config-providers-0.1.0-with-dependencies.zip

Extract the downloaded zip ﬁle in the custom-plugin folder using the follwoing command.

unzip msk-config-providers-0.1.0-with-dependencies.zip

4. Zip the contents of the MSK conﬁg provider from the above step and the custom connector

into a single ﬁle named custom-plugin.zip.

zip -r ../custom-plugin.zip *

5. Upload the ﬁle to S3 to be referenced later.

aws s3 cp ../custom-plugin.zip s3:<S3_URI_BUCKET_LOCATION>

6. On the Amazon MSK console, under the MSK Connect section, choose Custom Plugin, then

choose Create custom plugin and browse the s3:<S3_URI_BUCKET_LOCATION> S3 bucket to

select the custom plugin ZIP ﬁle you just uploaded.

Create custom plugin and upload to S3 139

Amazon Managed Streaming for Apache Kafka Developer Guide

Enter debezium-custom-plugin for the plugin name. Optionally, enter a description and

choose Create Custom Plugin.

Conﬁgure parameters and permissions for diﬀerent providers

You can conﬁgure parameter values in these three services:

• Secrets Manager

• Systems Manager Parameter Store

• S3 - Simple Storage Service

Select one of the tabs below for instructions on setting up parameters and relevant permissions for

that service.

Conﬁgure parameters and permissions for diﬀerent providers 140

Amazon Managed Streaming for Apache Kafka Developer Guide

Conﬁgure in Secrets Manager

To conﬁgure parameter values in Secrets Manager

1. Open the Secrets Manager console.

2. Create a new secret to store your credentials or secrets. For instructions, see Create an AWS

Secrets Manager secret in the AWS Secrets Manager User Guide.

3. Copy your secret's ARN.

4. Add the Secrets Manager permissions from the following example policy

to your Service execution role. Replace <arn:aws:secretsmanager:us-

east-1:123456789000:secret:MySecret-1234> with the ARN of your secret.

5. Add worker conﬁguration and connector instructions.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"secretsmanager:GetResourcePolicy",

"secretsmanager:GetSecretValue",

"secretsmanager:DescribeSecret",

"secretsmanager:ListSecretVersionIds"

"Resource": [

"<arn:aws:secretsmanager:us-

east-1:123456789000:secret:MySecret-1234>"

]

}

]

}

6. For using the Secrets Manager conﬁguration provider, copy the following lines of code to

the worker conﬁguration textbox in Step 3:

# define name of config provider:

config.providers = secretsmanager

# provide implementation classes for secrets manager:

Conﬁgure parameters and permissions for diﬀerent providers 141

Amazon Managed Streaming for Apache Kafka Developer Guide

config.providers.secretsmanager.class =

com.amazonaws.kafka.config.providers.SecretsManagerConfigProvider

# configure a config provider (if it needs additional initialization), for

example you can provide a region where the secrets or parameters are located:

config.providers.secretsmanager.param.region = us-east-1

7. For the secrets manager conﬁguration provider, copy the following lines of code in the

connector conﬁguration in Step 4.

#Example implementation for secrets manager variable

database.hostname=${secretsmanager:MSKAuroraDBCredentials:username}

database.password=${secretsmanager:MSKAuroraDBCredentials:password}

You may also use the above step with more conﬁguration providers.

Conﬁgure in Systems Manager Parameter Store

To conﬁgure parameter values in Systems Manager Parameter Store

1. Open the Systems Manager console.

2. In the navigation pane, choose Parameter Store.

3. Create a new parameter to store in the Systems Manager. For instructions, see Create a

Systems Manager parameter (console) in the AWS Systems Manager User Guide.

4. Copy your parameter's ARN.

5. Add the Systems Manager permissions from the following example policy to your Service

execution role. Replace <arn:aws:ssm:us-east-1:123456789000:parameter/

MyParameterName> with the ARN of your parameter.

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "VisualEditor0",

"Effect": "Allow",

"Action": [

"ssm:GetParameterHistory",

"ssm:GetParametersByPath",

Conﬁgure parameters and permissions for diﬀerent providers 142

Amazon Managed Streaming for Apache Kafka Developer Guide

"ssm:GetParameters",

"ssm:GetParameter"

"Resource": "arn:aws:ssm:us-east-1:123456789000:parameter/

MyParameterName"

}

]

}

6. For using the parameter store conﬁguration provider, copy the following lines of code to

the worker conﬁguration textbox in Step 3:

# define name of config provider:

config.providers = ssm

# provide implementation classes for parameter store:

config.providers.ssm.class =

com.amazonaws.kafka.config.providers.SsmParamStoreConfigProvider

# configure a config provider (if it needs additional initialization), for

example you can provide a region where the secrets or parameters are located:

config.providers.ssm.param.region = us-east-1

7. For the parameter store conﬁguration provider copy the following lines of code in the

connector conﬁguration in Step 5.

#Example implementation for parameter store variable

schema.history.internal.kafka.bootstrap.servers=

${ssm::MSKBootstrapServerAddress}

You may also bundle the above two steps with more conﬁguration providers.

Conﬁgure in Amazon S3

To conﬁgure objects/ﬁles in Amazon S3

1. Open the Amazon S3 console.

2. Upload your object to a bucket in S3. For instructions, see Uploading objects.

Conﬁgure parameters and permissions for diﬀerent providers 143

Amazon Managed Streaming for Apache Kafka Developer Guide

3. Copy your object's ARN.

4. Add the Amazon S3 Object Read permissions from the following example policy to your

Service execution role. Replace <arn:aws:s3:::MY_S3_BUCKET/path/to/custom-

plugin.zip> with the ARN of your object.

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "VisualEditor0",

"Effect": "Allow",

"Action": "s3:GetObject",

"Resource": "<arn:aws:s3:::MY_S3_BUCKET/path/to/custom-

plugin.zip>"

}

]

}

5. For using the Amazon S3 conﬁguration provider, copy the following lines of code to the

worker conﬁguration text-box in Step 3:

# define name of config provider:

config.providers = s3import

# provide implementation classes for S3:

config.providers.s3import.class =

com.amazonaws.kafka.config.providers.S3ImportConfigProvider

6. For the Amazon S3 conﬁguration provider, copy the following lines of code to the

connector conﬁguration in Step 4.

#Example implementation for S3 object

database.ssl.truststore.location = ${s3import:us-west-2:my_cert_bucket/path/to/

trustore_unique_filename.jks}

You may also bundle the above two steps with more conﬁguration providers.

Conﬁgure parameters and permissions for diﬀerent providers 144

Amazon Managed Streaming for Apache Kafka Developer Guide

Create a custom worker conﬁguration with information about your

conﬁguration provider

1. Select Worker conﬁgurations under the Amazon MSK Connect section.

2. Select Create worker conﬁguration.

Enter SourceDebeziumCustomConfig in the Worker Conﬁguration Name textbox. The

Description is optional.

4. Copy the relevant conﬁguration code based on the providers desired, and paste it in the

Worker conﬁguration textbox.

5. This is an example of the worker conﬁguration for all the three providers:

key.converter=org.apache.kafka.connect.storage.StringConverter

key.converter.schemas.enable=false

value.converter=org.apache.kafka.connect.json.JsonConverter

value.converter.schemas.enable=false

offset.storage.topic=offsets_my_debezium_source_connector

# define names of config providers:

config.providers=secretsmanager,ssm,s3import

# provide implementation classes for each provider:

config.providers.secretsmanager.class =

com.amazonaws.kafka.config.providers.SecretsManagerConfigProvider

config.providers.ssm.class =

com.amazonaws.kafka.config.providers.SsmParamStoreConfigProvider

config.providers.s3import.class =

com.amazonaws.kafka.config.providers.S3ImportConfigProvider

# configure a config provider (if it needs additional initialization), for example

you can provide a region where the secrets or parameters are located:

config.providers.secretsmanager.param.region = us-east-1

config.providers.ssm.param.region = us-east-1

6. Click on Create worker conﬁguration.

Create custom worker conﬁg 145

Amazon Managed Streaming for Apache Kafka Developer Guide

Create the connector

1. Create a new connector using the instructions in Create a new connector.

Choose the custom-plugin.zip ﬁle that you uploaded to your S3 bucket in ??? as the source

for the custom plugin.

3. Copy the relevant conﬁguration code based on the providers desired, and paste them in the

Connector conﬁguration ﬁeld.

4. This is an example for the connector conﬁguration for all the three providers:

#Example implementation for parameter store variable

schema.history.internal.kafka.bootstrap.servers=${ssm::MSKBootstrapServerAddress}

#Example implementation for secrets manager variable

database.hostname=${secretsmanager:MSKAuroraDBCredentials:username}

database.password=${secretsmanager:MSKAuroraDBCredentials:password}

#Example implementation for Amazon S3 file/object

database.ssl.truststore.location = ${s3import:us-west-2:my_cert_bucket/path/to/

trustore_unique_filename.jks}

5. Select Use a custom conﬁguration and choose SourceDebeziumCustomConﬁg from the

Worker Conﬁguration dropdown.

6. Follow the remaining steps from instructions in Create connector.

IAM roles and policies for MSK Connect

This section helps you set up the appropriate IAM policies and roles to securely deploy and

manage Amazon MSK Connect within your AWS environment.The following sections explain the

service execution role that must be used with MSK Connect, including the required trust policy

and additional permissions needed when connecting to an IAM-authenticated MSK cluster. The

page also provides examples of comprehensive IAM policies to grant full access to MSK Connect

functionality, as well as details on AWS managed policies available for the service.

Topics

• Understand service execution role

• Use examples of IAM policies for MSK Connect

Create the connector 146

Amazon Managed Streaming for Apache Kafka Developer Guide

• Prevent cross-service confused deputy problem

• AWS managed policies for MSK Connect

• Use service-linked roles for MSK Connect

Understand service execution role

Note

Amazon MSK Connect does not support using the Service-linked role as the service

execution role. You must create a separate service execution role. For instructions on how

to create a custom IAM role, see Creating a role to delegate permissions to an AWS service

in the IAM User Guide.

When you create a connector with MSK Connect, you are required to specify an AWS Identity and

Access Management (IAM) role to use with it. Your service execution role must have the following

trust policy so that MSK Connect can assume it. For information about the condition context keys

in this policy, see the section called “Prevent cross-service confused deputy problem”.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": "kafkaconnect.amazonaws.com"

"Action": "sts:AssumeRole",

"Condition": {

"StringEquals": {

"aws:SourceAccount": "Account-ID"

"ArnLike": {

"aws:SourceArn": "MSK-Connector-ARN"

}

]

}

Understand service execution role 147

Amazon Managed Streaming for Apache Kafka Developer Guide

If the Amazon MSK cluster that you want to use with your connector is a cluster that uses IAM

authentication, then you must add the following permissions policy to the connector's service

execution role. For information on how to ﬁnd your cluster's UUID and how to construct topic

ARNs, see the section called “Authorization policy resources”.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"kafka-cluster:Connect",

"kafka-cluster:DescribeCluster"

"Resource": [

"cluster-arn"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:ReadData",

"kafka-cluster:DescribeTopic"

"Resource": [

"ARN of the topic that you want a sink connector to read from"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:WriteData",

"kafka-cluster:DescribeTopic"

"Resource": [

"ARN of the topic that you want a source connector to write to"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:CreateTopic",

"kafka-cluster:WriteData",

Understand service execution role 148

Amazon Managed Streaming for Apache Kafka Developer Guide

"kafka-cluster:ReadData",

"kafka-cluster:DescribeTopic"

"Resource": [

"arn:aws:kafka:region:account-id:topic/cluster-name/cluster-uuid/

__amazon_msk_connect_*"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:AlterGroup",

"kafka-cluster:DescribeGroup"

"Resource": [

"arn:aws:kafka:region:account-id:group/cluster-name/cluster-uuid/

__amazon_msk_connect_*",

"arn:aws:kafka:region:account-id:group/cluster-name/cluster-uuid/

connect-*"

]

}

]

}

Depending on the kind of connector, you might also need to attach to the service execution role

a permissions policy that allows it to access AWS resources. For example, if your connector needs

to send data to an S3 bucket, then the service execution role must have a permissions policy that

grants permission to write to that bucket. For testing purposes, you can use one of the pre-built

IAM policies that give full access, like arn:aws:iam::aws:policy/AmazonS3FullAccess.

However, for security purposes, we recommend that you use the most restrictive policy that allows

your connector to read from the AWS source or write to the AWS sink.

Use examples of IAM policies for MSK Connect

To give a non-admin user full access to all MSK Connect functionality, attach a policy like the

following one to the user's IAM role.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

Example policies 149

Amazon Managed Streaming for Apache Kafka Developer Guide

"Action": [

"kafkaconnect:*",

"ec2:CreateNetworkInterface",

"ec2:DescribeSubnets",

"ec2:DescribeVpcs",

"ec2:DescribeSecurityGroups",

"logs:CreateLogDelivery",

"logs:GetLogDelivery",

"logs:DeleteLogDelivery",

"logs:ListLogDeliveries",

"logs:PutResourcePolicy",

"logs:DescribeResourcePolicies",

"logs:DescribeLogGroups"

"Resource": "*"

{

"Effect": "Allow",

"Action": "iam:CreateServiceLinkedRole",

"Resource": "arn:aws:iam::*:role/aws-service-role/

kafkaconnect.amazonaws.com/AWSServiceRoleForKafkaConnect*",

"Condition": {

"StringLike": {

"iam:AWSServiceName": "kafkaconnect.amazonaws.com"

}

{

"Effect": "Allow",

"Action": [

"iam:AttachRolePolicy",

"iam:PutRolePolicy"

"Resource": "arn:aws:iam::*:role/aws-service-role/

kafkaconnect.amazonaws.com/AWSServiceRoleForKafkaConnect*"

{

"Effect": "Allow",

"Action": "iam:CreateServiceLinkedRole",

"Resource": "arn:aws:iam::*:role/aws-service-role/

delivery.logs.amazonaws.com/AWSServiceRoleForLogDelivery*",

"Condition": {

"StringLike": {

"iam:AWSServiceName": "delivery.logs.amazonaws.com"

Example policies 150

Amazon Managed Streaming for Apache Kafka Developer Guide

}

{

"Effect": "Allow",

"Action": [

"s3:PutBucketPolicy",

"s3:GetBucketPolicy"

"Resource": "ARN of the Amazon S3 bucket to which you want MSK Connect to

deliver logs"

{

"Effect": "Allow",

"Action": "iam:PassRole",

"Resource": "ARN of the service execution role"

{

"Effect": "Allow",

"Action": "s3:GetObject",

"Resource": "ARN of the Amazon S3 object that corresponds to the custom

plugin that you want to use for creating connectors"

{

"Effect": "Allow",

"Action": "firehose:TagDeliveryStream",

"Resource": "ARN of the Firehose delivery stream to which you want MSK

Connect to deliver logs"

}

]

}

Prevent cross-service confused deputy problem

The confused deputy problem is a security issue where an entity that doesn't have permission to

perform an action can coerce a more-privileged entity to perform the action. In AWS, cross-service

impersonation can result in the confused deputy problem. Cross-service impersonation can occur

when one service (the calling service) calls another service (the called service). The calling service

can be manipulated to use its permissions to act on another customer's resources in a way it should

not otherwise have permission to access. To prevent this, AWS provides tools that help you protect

your data for all services with service principals that have been given access to resources in your

account.

Prevent cross-service confused deputy problem 151

Amazon Managed Streaming for Apache Kafka Developer Guide

We recommend using the aws:SourceArn and aws:SourceAccount global condition context

keys in resource policies to limit the permissions that MSK Connect gives another service to

the resource. If the aws:SourceArn value does not contain the account ID (for example, an

Amazon S3 bucket ARN doesn't contain the account ID), you must use both global condition

context keys to limit permissions. If you use both global condition context keys and the

aws:SourceArn value contains the account ID, the aws:SourceAccount value and the account

in the aws:SourceArn value must use the same account ID when used in the same policy

statement. Use aws:SourceArn if you want only one resource to be associated with the cross-

service access. Use aws:SourceAccount if you want to allow any resource in that account to be

associated with the cross-service use.

In the case of MSK Connect, the value of aws:SourceArn must be an MSK connector.

The most eﬀective way to protect against the confused deputy problem is to use the

aws:SourceArn global condition context key with the full ARN of the resource. If you don't know

the full ARN of the resource or if you are specifying multiple resources, use the aws:SourceArn

global context condition key with wildcards (*) for the unknown portions of the ARN. For example,

arn:aws:kafkaconnect:us-east-1:123456789012:connector/* represents all connectors

that belong to the account with ID 123456789012 in the US East (N. Virginia) Region.

The following example shows how you can use the aws:SourceArn and aws:SourceAccount

global condition context keys in MSK Connect to prevent the confused deputy problem. Replace

Account-ID and MSK-Connector-ARN with your information.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": " kafkaconnect.amazonaws.com"

"Action": "sts:AssumeRole",

"Condition": {

"StringEquals": {

"aws:SourceAccount": "Account-ID"

"ArnLike": {

"aws:SourceArn": "MSK-Connector-ARN"

}

Prevent cross-service confused deputy problem 152

Amazon Managed Streaming for Apache Kafka Developer Guide

}

]

}

AWS managed policies for MSK Connect

An AWS managed policy is a standalone policy that is created and administered by AWS. AWS

managed policies are designed to provide permissions for many common use cases so that you can

start assigning permissions to users, groups, and roles.

Keep in mind that AWS managed policies might not grant least-privilege permissions for your

speciﬁc use cases because they're available for all AWS customers to use. We recommend that you

reduce permissions further by deﬁning customer managed policies that are speciﬁc to your use

cases.

You cannot change the permissions deﬁned in AWS managed policies. If AWS updates the

permissions deﬁned in an AWS managed policy, the update aﬀects all principal identities (users,

groups, and roles) that the policy is attached to. AWS is most likely to update an AWS managed

policy when a new AWS service is launched or new API operations become available for existing

services.

For more information, see AWS managed policies in the IAM User Guide.

AWS managed policy: AmazonMSKConnectReadOnlyAccess

This policy grants the user the permissions that are needed to list and describe MSK Connect

resources.

You can attach the AmazonMSKConnectReadOnlyAccess policy to your IAM identities.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"kafkaconnect:ListConnectors",

"kafkaconnect:ListCustomPlugins",

"kafkaconnect:ListWorkerConfigurations"

"Resource": "*"

AWS managed policies 153

Amazon Managed Streaming for Apache Kafka Developer Guide

{

"Effect": "Allow",

"Action": [

"kafkaconnect:DescribeConnector"

"Resource": [

"arn:aws:kafkaconnect:*:*:connector/*"

]

{

"Effect": "Allow",

"Action": [

"kafkaconnect:DescribeCustomPlugin"

"Resource": [

"arn:aws:kafkaconnect:*:*:custom-plugin/*"

]

{

"Effect": "Allow",

"Action": [

"kafkaconnect:DescribeWorkerConfiguration"

"Resource": [

"arn:aws:kafkaconnect:*:*:worker-configuration/*"

]

}

]

}

AWS managed policy: KafkaConnectServiceRolePolicy

This policy grants the MSK Connect service the permissions that are needed to create and manage

network interfaces that have the tag AmazonMSKConnectManaged:true. These network

interfaces give MSK Connect network access to resources in your Amazon VPC, such as an Apache

Kafka cluster or a source or a sink.

You can't attach KafkaConnectServiceRolePolicy to your IAM entities. This policy is attached to a

service-linked role that allows MSK Connect to perform actions on your behalf.

{

"Version": "2012-10-17",

"Statement": [

AWS managed policies 154

Amazon Managed Streaming for Apache Kafka Developer Guide

{

"Effect": "Allow",

"Action": [

"ec2:CreateNetworkInterface"

"Resource": "arn:aws:ec2:*:*:network-interface/*",

"Condition": {

"StringEquals": {

"aws:RequestTag/AmazonMSKConnectManaged": "true"

"ForAllValues:StringEquals": {

"aws:TagKeys": "AmazonMSKConnectManaged"

}

{

"Effect": "Allow",

"Action": [

"ec2:CreateNetworkInterface"

"Resource": [

"arn:aws:ec2:*:*:subnet/*",

"arn:aws:ec2:*:*:security-group/*"

]

{

"Effect": "Allow",

"Action": [

"ec2:CreateTags"

"Resource": "arn:aws:ec2:*:*:network-interface/*",

"Condition": {

"StringEquals": {

"ec2:CreateAction": "CreateNetworkInterface"

}

{

"Effect": "Allow",

"Action": [

"ec2:DescribeNetworkInterfaces",

"ec2:CreateNetworkInterfacePermission",

"ec2:AttachNetworkInterface",

"ec2:DetachNetworkInterface",

AWS managed policies 155

Amazon Managed Streaming for Apache Kafka Developer Guide

"ec2:DeleteNetworkInterface"

"Resource": "arn:aws:ec2:*:*:network-interface/*",

"Condition": {

"StringEquals": {

"ec2:ResourceTag/AmazonMSKConnectManaged": "true"

}

]

}

MSK Connect updates to AWS managed policies

View details about updates to AWS managed policies for MSK Connect since this service began

tracking these changes.

Change Description Date

MSK Connect updated read-

only policy

MSK Connect updated the

AmazonMSKConnectRe

adOnlyAccess policy to

remove the restrictions on

listing operations.

October 13, 2021

MSK Connect started tracking

changes

MSK Connect started tracking

changes for its AWS managed

policies.

September 14, 2021

Use service-linked roles for MSK Connect

Amazon MSK Connect uses AWS Identity and Access Management (IAM) service-linked roles. A

service-linked role is a unique type of IAM role that is linked directly to MSK Connect. Service-

linked roles are predeﬁned by MSK Connect and include all the permissions that the service

requires to call other AWS services on your behalf.

Use service-linked roles 156

Amazon Managed Streaming for Apache Kafka Developer Guide

A service-linked role makes setting up MSK Connect easier because you don't have to manually add

the necessary permissions. MSK Connect deﬁnes the permissions of its service-linked roles, and

unless deﬁned otherwise, only MSK Connect can assume its roles. The deﬁned permissions include

the trust policy and the permissions policy, and that permissions policy cannot be attached to any

other IAM entity.

For information about other services that support service-linked roles, see AWS Services That Work

with IAM and look for the services that have Yes in the Service-Linked Role column. Choose a Yes

with a link to view the service-linked role documentation for that service.

Service-linked role permissions for MSK Connect

MSK Connect uses the service-linked role named AWSServiceRoleForKafkaConnect – Allows

Amazon MSK Connect to access Amazon resources on your behalf.

The AWSServiceRoleForKafkaConnect service-linked role trusts the

kafkaconnect.amazonaws.com service to assume the role.

For information about the permissions policy that the role uses, see the section called

“KafkaConnectServiceRolePolicy”.

You must conﬁgure permissions to allow an IAM entity (such as a user, group, or role) to create,

edit, or delete a service-linked role. For more information, see Service-Linked Role Permissions in

the IAM User Guide.

Creating a service-linked role for MSK Connect

You don't need to manually create a service-linked role. When you create a connector in the AWS

Management Console, the AWS CLI, or the AWS API, MSK Connect creates the service-linked role

for you.

If you delete this service-linked role, and then need to create it again, you can use the same process

to recreate the role in your account. When you create a connector, MSK Connect creates the

service-linked role for you again.

Editing a service-linked role for MSK Connect

MSK Connect does not allow you to edit the AWSServiceRoleForKafkaConnect service-linked

role. After you create a service-linked role, you can't change the name of the role because various

entities might reference the role. However, you can edit the description of the role using IAM. For

more information, see Editing a Service-Linked Role in the IAM User Guide.

Use service-linked roles 157

Amazon Managed Streaming for Apache Kafka Developer Guide

Deleting a service-linked role for MSK Connect

You can use the IAM console, the AWS CLI or the AWS API to manually delete the service-linked

role. To do this, you must ﬁrst manually delete all of your MSK Connect connectors, and then you

can manually delete the role. For more information, see Deleting a Service-Linked Role in the IAM

User Guide.

Supported Regions for MSK Connect service-linked roles

MSK Connect supports using service-linked roles in all of the regions where the service is available.

For more information, see AWS Regions and Endpoints.

Enable internet access for Amazon MSK Connect

If your connector for Amazon MSK Connect needs access to the internet, we recommend that you

use the following Amazon Virtual Private Cloud (VPC) settings to enable that access.

• Conﬁgure your connector with private subnets.

• Create a public NAT gateway or NAT instance for your VPC in a public subnet. For more

information, see the Connect subnets to the internet or other VPCs using NAT devices page in

the Amazon Virtual Private Cloud User Guide.

• Allow outbound traﬃc from your private subnets to your NAT gateway or instance.

Set up a NAT gateway for Amazon MSK Connect

The following steps show you how to set up a NAT gateway to enable internet access for a

connector. You must complete these steps before you create a connector in a private subnet.

Complete prerequisites for setting up a NAT gateway

Make sure you have the following items.

• The ID of the Amazon Virtual Private Cloud (VPC) associated with your cluster. For example,

vpc-123456ab.

• The IDs of the private subnets in your VPC. For example, subnet-a1b2c3de, subnet-f4g5h6ij, etc.

You must conﬁgure your connector with private subnets.

Enable internet access 158

Amazon Managed Streaming for Apache Kafka Developer Guide

Steps to enable internet access for your connector

To enable internet access for your connector

1. Open the Amazon Virtual Private Cloud console at https://console.aws.amazon.com/vpc/.

2. Create a public subnet for your NAT gateway with a descriptive name, and note the subnet ID.

For detailed instructions, see Create a subnet in your VPC.

3. Create an internet gateway so that your VPC can communicate with the internet, and note the

gateway ID. Attach the internet gateway to your VPC. For instructions, see Create and attach

an internet gateway.

4. Provision a public NAT gateway so that hosts in your private subnets can reach your public

subnet. When you create the NAT gateway, select the public subnet that you created earlier.

For instructions, see Create a NAT gateway.

5. Conﬁgure your route tables. You must have two route tables in total to complete this setup.

You should already have a main route table that was automatically created at the same time as

your VPC. In this step you create an additional route table for your public subnet.

a. Use the following settings to modify your VPC's main route table so that your private

subnets route traﬃc to your NAT gateway. For instructions, see Work with route tables in

the Amazon Virtual Private Cloud User Guide.

Private MSKC route table

Property Value

Name tag We recommend that you give this route

table a descriptive name tag to help you

identify it. For example, Private MSKC.

Associated subnets Your private subnets

A route to enable internet access for MSK

Connect

• Destination: 0.0.0.0/0

• Target: Your NAT gateway ID. For

example, nat-12a345bc6789efg1h.

A local route for internal traﬃc • Destination: 10.0.0.0/16. This value

may diﬀer depending on your VPC's

CIDR block.

Set up a NAT gateway 159

Amazon Managed Streaming for Apache Kafka Developer Guide

Property Value

• Target: Local

b. Follow the instructions in Create a custom route table to create a route table for your

public subnet. When you create the table, enter a descriptive name in the Name tag ﬁeld

to help you identify which subnet the table is associated with. For example, Public MSKC.

c. Conﬁgure your Public MSKC route table using the following settings.

Property Value

Name tag Public MSKC or a diﬀerent descriptive

name that you choose

Associated subnets Your public subnet with NAT gateway

A route to enable internet access for MSK

Connect

• Destination: 0.0.0.0/0

• Target: Your internet gateway ID. For

example, igw-1a234bc5.

A local route for internal traﬃc • Destination: 10.0.0.0/16. This value

may diﬀer depending on your VPC's

CIDR block.

• Target: Local

Understand private DNS hostnames

With Private DNS hostname support in MSK Connect, you can conﬁgure connectors to reference

public or private domain names. Support depends on the DNS servers speciﬁed in the VPC DHCP

option set.

A DHCP option set is a group of network conﬁgurations that EC2 instances use in a VPC to

communicate over the VPC network. Each VPC has a default DHCP option set, but you can create

a custom DHCP option set if you want instances in a VPC to use a diﬀerent DNS server for domain

name resolution, instead of the Amazon-provided DNS server. See DHCP option sets in Amazon

VPC.

Understand private DNS hostnames 160

Amazon Managed Streaming for Apache Kafka Developer Guide

Before the Private DNS resolution capability/feature was included with MSK Connect, connectors

used the service VPC DNS resolvers for DNS queries from a customer connector. Connectors did not

use the DNS servers deﬁned in the customer VPC DHCP option sets for DNS resolution.

Connectors could only reference hostnames in customer connector conﬁgurations or plugins that

were publicly resolvable. They couldn't resolve private hostnames deﬁned in a privately-hosted

zone or use DNS servers in another customer network.

Without Private DNS, customers who chose to make their databases, data warehouses, and systems

like the Secrets Manager in their own VPC inaccessible to the internet, couldn't work with MSK

connectors. Customers often use private DNS hostnames to comply with corporate security

posture.

Conﬁgure a VPC DHCP option set for your connector

Connectors automatically use the DNS servers deﬁned in their VPC DHCP option set when the

connector is created. Before you create a connector, make sure that you conﬁgure the VPC DHCP

option set for your connector's DNS hostname resolution requirements.

Connectors that you created before the Private DNS hostname feature was available in MSK

Connect continue to use the previous DNS resolution conﬁguration with no modiﬁcation required.

If you need only publicly resolvable DNS hostname resolution in your connector, for easier setup

we recommend using the default VPC of your account when you create the connector. See Amazon

DNS Server in the Amazon VPC User Guide for more information on the Amazon-provided DNS

server or Amazon Route53 Resolver.

If you need to resolve private DNS hostnames, make sure the VPC that is passed during connector

creation has its DHCP options set correctly conﬁgured. For more information, see Work with DHCP

option sets in the Amazon VPC User Guide.

When you conﬁgure a DHCP option set for private DNS hostname resolution, ensure that the

connector can reach the custom DNS servers that you conﬁgure in the DHCP option set. Otherwise,

your connector creation will fail.

After you customize the VPC DHCP option set, connectors subsequently created in that VPC use

the DNS servers that you speciﬁed in the option set. If you change the option set after you create a

connector, the connector adopts the settings in the new option set within a couple of minutes.

Conﬁgure a VPC DHCP option 161

Amazon Managed Streaming for Apache Kafka Developer Guide

Conﬁgure DNS attributes for your VPC

Make sure you have the VPC DNS attributes correctly conﬁgured as described in DNS attributes in

your VPC and DNS hostnames in the Amazon VPC User Guide.

See Resolving DNS queries between VPCs and your network in the Amazon Route53 Developer

Guide for information on using inbound and outbound resolver endpoints to connect other

networks to your VPC to work with your connector.

Handle connector creation failures

This section describes possible connector creation failures associated with DNS resolution and

suggested actions to resolve the issues.

Failure Suggested action

Connector creation fails if a DNS resolution

query fails, or if DNS servers are unreachable

from the connector.

You can see connector creation failures due to

unsuccessful DNS resolution queries in your

CloudWatch logs, if you've conﬁgured these

logs for your connector.

Check the DNS server conﬁgurations and

ensure network connectivity to the DNS

servers from the connector.

If you change the DNS servers conﬁguration in

your VPC DHCP option set while a connector

is running, DNS resolution queries from the

connector can fail. If the DNS resolution fails,

some of the connector tasks can enter a failed

state.

You can see connector creation failures due to

unsuccessful DNS resolution queries in your

CloudWatch logs, if you've conﬁgured these

logs for your connector.

The failed tasks should automatically restart

to bring the connector back up. If that does

not happen, you can contact support to restart

the failed tasks for their connector or you can

recreate the connector.

Conﬁgure DNS attributes 162

Amazon Managed Streaming for Apache Kafka Developer Guide

Logging for MSK Connect

MSK Connect can write log events that you can use to debug your connector. When you create a

connector, you can specify zero or more of the following log destinations:

• Amazon CloudWatch Logs: You specify the log group to which you want MSK Connect to send

your connector's log events. For information on how to create a log group, see Create a log

group in the CloudWatch Logs User Guide.

• Amazon S3: You specify the S3 bucket to which you want MSK Connect to send your connector's

log events. For information on how to create an S3 bucket, see Creating a bucket in the Amazon

S3 User Guide.

• Amazon Data Firehose: You specify the delivery stream to which you want MSK Connect to send

your connector's log events. For information on how to create a delivery stream, see Creating an

Amazon Data Firehose delivery stream in the Firehose User Guide.

To learn more about setting up logging, see Enabling logging from certain AWS services in the

Amazon CloudWatch Logs User Guide.

MSK Connect emits the following types of log events:

Level Description

INFO

Runtime events of interest at startup and

shutdown.

WARN

Runtime situations that aren't errors but are

undesirable or unexpected.

FATAL

Severe errors that cause premature terminati

on.

ERROR

Unexpected conditions and runtime errors

that aren't fatal.

The following is an example of a log event sent to CloudWatch Logs:

Logging 163

Amazon Managed Streaming for Apache Kafka Developer Guide

[Worker-0bb8afa0b01391c41] [2021-09-06 16:02:54,151] WARN [Producer

clientId=producer-1] Connection to node 1 (b-1.my-test-cluster.twwhtj.c2.kafka.us-

east-1.amazonaws.com/INTERNAL_IP) could not be established. Broker may not be

available. (org.apache.kafka.clients.NetworkClient:782)

Preventing secrets from appearing in connector logs

Note

Sensitive conﬁguration values can appear in connector logs if a plugin does not deﬁne

those values as secret. Kafka Connect treats undeﬁned conﬁguration values the same as

any other plaintext value.

If your plugin deﬁnes a property as secret, Kafka Connect redacts the property's value from

connector logs. For example, the following connector logs demonstrate that if a plugin deﬁnes

aws.secret.key as a PASSWORD type, then its value is replaced with [hidden].

2022-01-11T15:18:55.000+00:00 [Worker-05e6586a48b5f331b] [2022-01-11

15:18:55,150] INFO SecretsManagerConfigProviderConfig values:

2022-01-11T15:18:55.000+00:00 [Worker-05e6586a48b5f331b] aws.access.key =

my_access_key

2022-01-11T15:18:55.000+00:00 [Worker-05e6586a48b5f331b] aws.region = us-east-1

2022-01-11T15:18:55.000+00:00 [Worker-05e6586a48b5f331b] aws.secret.key

= [hidden]

2022-01-11T15:18:55.000+00:00 [Worker-05e6586a48b5f331b] secret.prefix =

2022-01-11T15:18:55.000+00:00 [Worker-05e6586a48b5f331b] secret.ttl.ms = 300000

2022-01-11T15:18:55.000+00:00 [Worker-05e6586a48b5f331b]

(com.github.jcustenborder.kafka.config.aws.SecretsManagerConfigProviderConfig:361)

To prevent secrets from appearing in connector log ﬁles, a plugin developer must use the Kafka

Connect enum constant ConfigDef.Type.PASSWORD to deﬁne sensitive properties. When a

property is type ConfigDef.Type.PASSWORD, Kafka Connect excludes its value from connector

logs even if the value is sent as plaintext.

Monitoring MSK Connect

Monitoring is an important part of maintaining the reliability, availability, and performance of MSK

Connect and your other AWS solutions. Amazon CloudWatch monitors your AWS resources and the

Preventing secrets from appearing in connector logs 164

Amazon Managed Streaming for Apache Kafka Developer Guide

applications that you run on AWS in real time. You can collect and track metrics, create customized

dashboards, and set alarms that notify you or take actions when a speciﬁed metric reaches a

threshold that you specify. For example, you can have CloudWatch track CPU usage or other

metrics of your connector, so that you can increase its capacity if needed. For more information,

see the Amazon CloudWatch User Guide.

The following table shows the metrics that MSK Connect sends to CloudWatch under the

ConnectorName dimension. MSK Connect delivers these metrics by default and at no additional

cost. CloudWatch keeps these metrics for 15 months, so that you can access historical information

and gain a better perspective on how your connectors are performing. You can also set alarms that

watch for certain thresholds, and send notiﬁcations or take actions when those thresholds are met.

For more information, see the Amazon CloudWatch User Guide.

MSK Connect metrics

Metric name Description

BytesInPerSec

The total number of bytes received by the

connector.

BytesOutPerSec

The total number of bytes delivered by the

connector.

CpuUtilization

The percentage of CPU consumption by

system and user.

ErroredTaskCount

The number of tasks that have errored out.

MemoryUtilization

The percentage of the total memory on a

worker instance, not just the Java virtual

machine (JVM) heap memory currently in

use. JVM doesn't typically release memory

back to the operational system. So, JVM

heap size (MemoryUtilization) usually starts

with a minimum heap size that increment

ally increases to a stable maximum of about

80-90%. JVM heap usage might increase or

decrease as the connector's actual memory

usage changes.

Monitoring 165

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric name Description

RebalanceCompletedTotal

The total number of rebalances completed by

this connector.

RebalanceTimeAvg

The average time in milliseconds spent by the

connector on rebalancing.

RebalanceTimeMax

The maximum time in milliseconds spent by

the connector on rebalancing.

RebalanceTimeSinceLast

The time in milliseconds since this connector

completed the most recent rebalance.

RunningTaskCount

The running number of tasks in the connector.

SinkRecordReadRate

The average per-second number of records

read from the Apache Kafka or Amazon MSK

cluster.

SinkRecordSendRate

The average per-second number of records

that are output from the transformations and

sent to the destination. This number doesn't

include ﬁltered records.

SourceRecordPollRate

The average per-second number of records

produced or polled.

SourceRecordWriteRate

The average per-second number of records

output from the transformations and written

to the Apache Kafka or Amazon MSK cluster.

TaskStartupAttemptsTotal

The total number of task startups that the

connector has attempted. You can use this

metric to identify anomalies in task startup

attempts.

Monitoring 166

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric name Description

TaskStartupSuccessPercentage

The average percentage of successful task

starts for the connector. You can use this

metric to identify anomalies in task startup

attempts.

WorkerCount

The number of workers that are running in the

connector.

Examples to set up Amazon MSK Connect resources

This section includes examples to help you set up Amazon MSK Connect resources such as common

third-party connectors and conﬁguration providers.

Topics

• Set up Amazon S3 sink connector

• Use Debezium source connector with conﬁguration provider

Set up Amazon S3 sink connector

This example shows how to use the Conﬂuent Amazon S3 sink connector and the AWS CLI to

create an Amazon S3 sink connector in MSK Connect.

1. Copy the following JSON and paste it in a new ﬁle. Replace the placeholder strings with

values that correspond to your Amazon MSK cluster's bootstrap servers connection string

and the cluster's subnet and security group IDs. For information about how to set up a service

execution role, see the section called “IAM roles and policies”.

{

"connectorConfiguration": {

"connector.class": "io.confluent.connect.s3.S3SinkConnector",

"s3.region": "us-east-1",

"format.class": "io.confluent.connect.s3.format.json.JsonFormat",

"flush.size": "1",

"schema.compatibility": "NONE",

"topics": "my-test-topic",

"tasks.max": "2",

Examples 167

Amazon Managed Streaming for Apache Kafka Developer Guide

"partitioner.class":

"io.confluent.connect.storage.partitioner.DefaultPartitioner",

"storage.class": "io.confluent.connect.s3.storage.S3Storage",

"s3.bucket.name": "my-test-bucket"

"connectorName": "example-S3-sink-connector",

"kafkaCluster": {

"apacheKafkaCluster": {

"bootstrapServers": "<cluster-bootstrap-servers-string>",

"vpc": {

"subnets": [

"<cluster-subnet-1>",

"<cluster-subnet-2>",

"<cluster-subnet-3>"

"securityGroups": ["<cluster-security-group-id>"]

}

"capacity": {

"provisionedCapacity": {

"mcuCount": 2,

"workerCount": 4

}

"kafkaConnectVersion": "2.7.1",

"serviceExecutionRoleArn": "<arn-of-a-role-that-msk-connect-can-assume>",

"plugins": [

{

"customPlugin": {

"customPluginArn": "<arn-of-custom-plugin-that-contains-connector-

code>",

"revision": 1

}

"kafkaClusterEncryptionInTransit": {"encryptionType": "PLAINTEXT"},

"kafkaClusterClientAuthentication": {"authenticationType": "NONE"}

}

2. Run the following AWS CLI command in the folder where you saved the JSON ﬁle in the

previous step.

Set up Amazon S3 sink connector 168

Amazon Managed Streaming for Apache Kafka Developer Guide

aws kafkaconnect create-connector --cli-input-json file://connector-info.json

The following is an example of the output that you get when you run the command

successfully.

{

"ConnectorArn": "arn:aws:kafkaconnect:us-east-1:123450006789:connector/example-

S3-sink-connector/abc12345-abcd-4444-a8b9-123456f513ed-2",

"ConnectorState": "CREATING",

"ConnectorName": "example-S3-sink-connector"

}

Use Debezium source connector with conﬁguration provider

This example shows how to use the Debezium MySQL connector plugin with a MySQL-compatible

Amazon Aurora database as the source. In this example, we also set up the open-source AWS

Secrets Manager Conﬁg Provider to externalize database credentials in AWS Secrets Manager. To

learn more about conﬁguration providers, see Tutorial: Externalizing sensitive information using

conﬁg providers.

Important

The Debezium MySQL connector plugin supports only one task and does not work with

autoscaled capacity mode for Amazon MSK Connect. You should instead use provisioned

capacity mode and set workerCount equal to one in your connector conﬁguration. To

learn more about the capacity modes for MSK Connect, see Understand connector capacity.

Complete prerequisites to use Debezium source connector

Your connector must be able to access the internet so that it can interact with services such as AWS

Secrets Manager that are outside of your Amazon Virtual Private Cloud. The steps in this section

help you complete the following tasks to enable internet access.

• Set up a public subnet that hosts a NAT gateway and routes traﬃc to an internet gateway in your

VPC.

• Create a default route that directs your private subnet traﬃc to your NAT gateway.

Use Debezium source connector 169

Amazon Managed Streaming for Apache Kafka Developer Guide

For more information, see Enable internet access for Amazon MSK Connect.

Prerequisites

Before you can enable internet access, you need the following items:

• The ID of the Amazon Virtual Private Cloud (VPC) associated with your cluster. For example,

vpc-123456ab.

• The IDs of the private subnets in your VPC. For example, subnet-a1b2c3de, subnet-f4g5h6ij, etc.

You must conﬁgure your connector with private subnets.

To enable internet access for your connector

1. Open the Amazon Virtual Private Cloud console at https://console.aws.amazon.com/vpc/.

2. Create a public subnet for your NAT gateway with a descriptive name, and note the subnet ID.

For detailed instructions, see Create a subnet in your VPC.

3. Create an internet gateway so that your VPC can communicate with the internet, and note the

gateway ID. Attach the internet gateway to your VPC. For instructions, see Create and attach

an internet gateway.

4. Provision a public NAT gateway so that hosts in your private subnets can reach your public

subnet. When you create the NAT gateway, select the public subnet that you created earlier.

For instructions, see Create a NAT gateway.

5. Conﬁgure your route tables. You must have two route tables in total to complete this setup.

You should already have a main route table that was automatically created at the same time as

your VPC. In this step you create an additional route table for your public subnet.

a. Use the following settings to modify your VPC's main route table so that your private

subnets route traﬃc to your NAT gateway. For instructions, see Work with route tables in

the Amazon Virtual Private Cloud User Guide.

Private MSKC route table

Property Value

Name tag We recommend that you give this route

table a descriptive name tag to help you

identify it. For example, Private MSKC.

Use Debezium source connector 170

Amazon Managed Streaming for Apache Kafka Developer Guide

Property Value

Associated subnets Your private subnets

A route to enable internet access for MSK

Connect

• Destination: 0.0.0.0/0

• Target: Your NAT gateway ID. For

example, nat-12a345bc6789efg1h.

A local route for internal traﬃc • Destination: 10.0.0.0/16. This value

may diﬀer depending on your VPC's

CIDR block.

• Target: Local

b. Follow the instructions in Create a custom route table to create a route table for your

public subnet. When you create the table, enter a descriptive name in the Name tag ﬁeld

to help you identify which subnet the table is associated with. For example, Public MSKC.

c. Conﬁgure your Public MSKC route table using the following settings.

Property Value

Name tag Public MSKC or a diﬀerent descriptive

name that you choose

Associated subnets Your public subnet with NAT gateway

A route to enable internet access for MSK

Connect

• Destination: 0.0.0.0/0

• Target: Your internet gateway ID. For

example, igw-1a234bc5.

A local route for internal traﬃc • Destination: 10.0.0.0/16. This value

may diﬀer depending on your VPC's

CIDR block.

• Target: Local

Now that you have enabled internet access for Amazon MSK Connect you are ready to create a

connector.

Use Debezium source connector 171

Amazon Managed Streaming for Apache Kafka Developer Guide

Create a Debezium source connector

1. Create a custom plugin

a. Download the MySQL connector plugin for the latest stable release from the Debezium

site. Make a note of the Debezium release version you download (version 2.x, or the older

series 1.x). Later in this procedure, you'll create a connector based on your Debezium

version.

b. Download and extract the AWS Secrets Manager Conﬁg Provider.

c. Place the following archives into the same directory:

•

The debezium-connector-mysql folder

•

The jcusten-border-kafka-config-provider-aws-0.1.1 folder

d. Compress the directory that you created in the previous step into a ZIP ﬁle and then

upload the ZIP ﬁle to an S3 bucket. For instructions, see Uploading objects in the Amazon

S3 User Guide.

Copy the following JSON and paste it in a ﬁle. For example, debezium-source-custom-

plugin.json. Replace <example-custom-plugin-name> with the name that you

want the plugin to have, <arn-of-your-s3-bucket> with the ARN of the S3 bucket

where you uploaded the ZIP ﬁle, and <file-key-of-ZIP-object> with the ﬁle key of

the ZIP object that you uploaded to S3.

{

"name": "<example-custom-plugin-name>",

"contentType": "ZIP",

"location": {

"s3Location": {

"bucketArn": "<arn-of-your-s3-bucket>",

"fileKey": "<file-key-of-ZIP-object>"

}

f. Run the following AWS CLI command from the folder where you saved the JSON ﬁle to

create a plugin.

aws kafkaconnect create-custom-plugin --cli-input-json file://<debezium-source-

custom-plugin.json>

Use Debezium source connector 172

Amazon Managed Streaming for Apache Kafka Developer Guide

You should see output similar to the following example.

{

"CustomPluginArn": "arn:aws:kafkaconnect:us-east-1:012345678901:custom-

plugin/example-custom-plugin-name/abcd1234-a0b0-1234-c1-12345678abcd-1",

"CustomPluginState": "CREATING",

"Name": "example-custom-plugin-name",

"Revision": 1

}

g. Run the following command to check the plugin state. The state should change from

CREATING to ACTIVE. Replace the ARN placeholder with the ARN that you got in the

output of the previous command.

aws kafkaconnect describe-custom-plugin --custom-plugin-arn "<arn-of-your-

custom-plugin>"

2. Conﬁgure AWS Secrets Manager and create a secret for your database credentials

a. Open the Secrets Manager console at https://console.aws.amazon.com/secretsmanager/.

b. Create a new secret to store your database sign-in credentials. For instructions, see Create

a secret in the AWS Secrets Manager User Guide.

c. Copy your secret's ARN.

d. Add the Secrets Manager permissions from the following example policy to your

Understand service execution role. Replace <arn:aws:secretsmanager:us-

east-1:123456789000:secret:MySecret-1234> with the ARN of your secret.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"secretsmanager:GetResourcePolicy",

"secretsmanager:GetSecretValue",

"secretsmanager:DescribeSecret",

"secretsmanager:ListSecretVersionIds"

"Resource": [

"<arn:aws:secretsmanager:us-east-1:123456789000:secret:MySecret-1234>"

Use Debezium source connector 173

Amazon Managed Streaming for Apache Kafka Developer Guide

]

}

]

}

For instructions on how to add IAM permissions, see Adding and removing IAM identity

permissions in the IAM User Guide.

3. Create a custom worker conﬁguration with information about your conﬁguration provider

a. Copy the following worker conﬁguration properties into a ﬁle, replacing the

placeholder strings with values that correspond to your scenario. To learn more

about the conﬁguration properties for the AWS Secrets Manager Conﬁg Provider, see

SecretsManagerConﬁgProvider in the plugin's documentation.

key.converter=<org.apache.kafka.connect.storage.StringConverter>

value.converter=<org.apache.kafka.connect.storage.StringConverter>

config.providers.secretManager.class=com.github.jcustenborder.kafka.config.aws.SecretsManagerConfigProvider

config.providers=secretManager

config.providers.secretManager.param.aws.region=<us-east-1>

b. Run the following AWS CLI command to create your custom worker conﬁguration.

Replace the following values:

•

<my-worker-config-name> - a descriptive name for your custom worker

conﬁguration

•

<encoded-properties-file-content-string> - a base64-encoded version of the

plaintext properties that you copied in the previous step

aws kafkaconnect create-worker-configuration --name <my-worker-config-name> --

properties-file-content <encoded-properties-file-content-string>

4. Create a connector

a. Copy the following JSON that corresponds to your Debezium version (2.x or 1.x) and paste

it in a new ﬁle. Replace the <placeholder> strings with values that correspond to your

scenario. For information about how to set up a service execution role, see the section

called “IAM roles and policies”.

Use Debezium source connector 174

Amazon Managed Streaming for Apache Kafka Developer Guide

Note that the conﬁguration uses variables like

${secretManager:MySecret-1234:dbusername} instead of plaintext to specify

database credentials. Replace MySecret-1234 with the name of your secret and then

include the name of the key that you want to retrieve. You must also replace <arn-of-

config-provider-worker-configuration> with the ARN of your custom worker

conﬁguration.

Debezium 2.x

For Debezium 2.x versions, copy the following JSON and paste it in a new ﬁle. Replace

the <placeholder> strings with values that correspond to your scenario.

{

"connectorConfiguration": {

"connector.class": "io.debezium.connector.mysql.MySqlConnector",

"tasks.max": "1",

"database.hostname": "<aurora-database-writer-instance-endpoint>",

"database.port": "3306",

"database.user": "<${secretManager:MySecret-1234:dbusername}>",

"database.password": "<${secretManager:MySecret-1234:dbpassword}>",

"database.server.id": "123456",

"database.include.list": "<list-of-databases-hosted-by-specified-server>",

"topic.prefix": "<logical-name-of-database-server>",

"schema.history.internal.kafka.topic": "<kafka-topic-used-by-debezium-to-

track-schema-changes>",

"schema.history.internal.kafka.bootstrap.servers": "<cluster-bootstrap-

servers-string>",

"schema.history.internal.consumer.security.protocol": "SASL_SSL",

"schema.history.internal.consumer.sasl.mechanism": "AWS_MSK_IAM",

"schema.history.internal.consumer.sasl.jaas.config":

"software.amazon.msk.auth.iam.IAMLoginModule required;",

"schema.history.internal.consumer.sasl.client.callback.handler.class":

"software.amazon.msk.auth.iam.IAMClientCallbackHandler",

"schema.history.internal.producer.security.protocol": "SASL_SSL",

"schema.history.internal.producer.sasl.mechanism": "AWS_MSK_IAM",

"schema.history.internal.producer.sasl.jaas.config":

"software.amazon.msk.auth.iam.IAMLoginModule required;",

"schema.history.internal.producer.sasl.client.callback.handler.class":

"software.amazon.msk.auth.iam.IAMClientCallbackHandler",

"include.schema.changes": "true"

Use Debezium source connector 175

Amazon Managed Streaming for Apache Kafka Developer Guide

"connectorName": "example-Debezium-source-connector",

"kafkaCluster": {

"apacheKafkaCluster": {

"bootstrapServers": "<cluster-bootstrap-servers-string>",

"vpc": {

"subnets": [

"<cluster-subnet-1>",

"<cluster-subnet-2>",

"<cluster-subnet-3>"

"securityGroups": ["<id-of-cluster-security-group>"]

}

"capacity": {

"provisionedCapacity": {

"mcuCount": 2,

"workerCount": 1

}

"kafkaConnectVersion": "2.7.1",

"serviceExecutionRoleArn": "<arn-of-service-execution-role-that-msk-

connect-can-assume>",

"plugins": [{

"customPlugin": {

"customPluginArn": "<arn-of-msk-connect-plugin-that-contains-connector-

code>",

"revision": 1

}

}],

"kafkaClusterEncryptionInTransit": {

"encryptionType": "TLS"

"kafkaClusterClientAuthentication": {

"authenticationType": "IAM"

"workerConfiguration": {

"workerConfigurationArn": "<arn-of-config-provider-worker-configuration>",

"revision": 1

}

Use Debezium source connector 176

Amazon Managed Streaming for Apache Kafka Developer Guide

Debezium 1.x

For Debezium 1.x versions, copy the following JSON and paste it in a new ﬁle. Replace

the <placeholder> strings with values that correspond to your scenario.

{

"connectorConfiguration": {

"connector.class": "io.debezium.connector.mysql.MySqlConnector",

"tasks.max": "1",

"database.hostname": "<aurora-database-writer-instance-endpoint>",

"database.port": "3306",

"database.user": "<${secretManager:MySecret-1234:dbusername}>",

"database.password": "<${secretManager:MySecret-1234:dbpassword}>",

"database.server.id": "123456",

"database.server.name": "<logical-name-of-database-server>",

"database.include.list": "<list-of-databases-hosted-by-specified-server>",

"database.history.kafka.topic": "<kafka-topic-used-by-debezium-to-track-

schema-changes>",

"database.history.kafka.bootstrap.servers": "<cluster-bootstrap-servers-

string>",

"database.history.consumer.security.protocol": "SASL_SSL",

"database.history.consumer.sasl.mechanism": "AWS_MSK_IAM",

"database.history.consumer.sasl.jaas.config":

"software.amazon.msk.auth.iam.IAMLoginModule required;",

"database.history.consumer.sasl.client.callback.handler.class":

"software.amazon.msk.auth.iam.IAMClientCallbackHandler",

"database.history.producer.security.protocol": "SASL_SSL",

"database.history.producer.sasl.mechanism": "AWS_MSK_IAM",

"database.history.producer.sasl.jaas.config":

"software.amazon.msk.auth.iam.IAMLoginModule required;",

"database.history.producer.sasl.client.callback.handler.class":

"software.amazon.msk.auth.iam.IAMClientCallbackHandler",

"include.schema.changes": "true"

"connectorName": "example-Debezium-source-connector",

"kafkaCluster": {

"apacheKafkaCluster": {

"bootstrapServers": "<cluster-bootstrap-servers-string>",

"vpc": {

"subnets": [

"<cluster-subnet-1>",

"<cluster-subnet-2>",

"<cluster-subnet-3>"

Use Debezium source connector 177

Amazon Managed Streaming for Apache Kafka Developer Guide

"securityGroups": ["<id-of-cluster-security-group>"]

}

"capacity": {

"provisionedCapacity": {

"mcuCount": 2,

"workerCount": 1

}

"kafkaConnectVersion": "2.7.1",

"serviceExecutionRoleArn": "<arn-of-service-execution-role-that-msk-

connect-can-assume>",

"plugins": [{

"customPlugin": {

"customPluginArn": "<arn-of-msk-connect-plugin-that-contains-connector-

code>",

"revision": 1

}

}],

"kafkaClusterEncryptionInTransit": {

"encryptionType": "TLS"

"kafkaClusterClientAuthentication": {

"authenticationType": "IAM"

"workerConfiguration": {

"workerConfigurationArn": "<arn-of-config-provider-worker-configuration>",

"revision": 1

}

b. Run the following AWS CLI command in the folder where you saved the JSON ﬁle in the

previous step.

aws kafkaconnect create-connector --cli-input-json file://connector-info.json

The following is an example of the output that you get when you run the command

successfully.

{

Use Debezium source connector 178

Amazon Managed Streaming for Apache Kafka Developer Guide

"ConnectorArn": "arn:aws:kafkaconnect:us-east-1:123450006789:connector/

example-Debezium-source-connector/abc12345-abcd-4444-a8b9-123456f513ed-2",

"ConnectorState": "CREATING",

"ConnectorName": "example-Debezium-source-connector"

}

For a Debezium connector example with detailed steps, see Introducing Amazon MSK Connect -

Stream Data to and from Your Apache Kafka Clusters Using Managed Connectors.

Migrate to Amazon MSK Connect

This section describes how to migrate your Apache Kafka connector application to Amazon

Managed Streaming for Apache Kafka Connect (Amazon MSK Connect). To know more about the

beneﬁts of migrating to Amazon MSK Connect, see ???.

This section also describes the state management topics used by Kafka Connect and Amazon MSK

Connect and covers procedures for migrating source and sink connectors.

Understand internal topics used by Kafka Connect

An Apache Kafka Connect application that’s running in distributed mode stores its state by using

internal topics in the Kafka cluster and group membership. The following are the conﬁguration

values that correspond to the internal topics that are used for Kafka Connect applications:

•

Conﬁguration topic, speciﬁed through config.storage.topic

In the conﬁguration topic, Kafka Connect stores the conﬁguration of all the connectors and tasks

that have been started by users. Each time users update the conﬁguration of a connector or

when a connector requests a reconﬁguration (for example, the connector detects that it can start

more tasks), a record is emitted to this topic. This topic is compaction enabled, so it always keeps

the last state for each entity.

•

Oﬀsets topic, speciﬁed through offset.storage.topic

In the oﬀsets topic, Kafka Connect stores the oﬀsets of the source connectors. Like the

conﬁguration topic, the oﬀsets topic is compaction enabled. This topic is used to write the source

positions only for source connectors that produce data to Kafka from external systems. Sink

connectors, which read data from Kafka and send to external systems, store their consumer

oﬀsets by using regular Kafka consumer groups.

Migrate to Amazon MSK Connect 179

Amazon Managed Streaming for Apache Kafka Developer Guide

•

Status topic, speciﬁed through status.storage.topic

In the status topic, Kafka Connect stores the current state of connectors and tasks. This topic is

used as the central place for the data that is queried by users of the REST API. This topic allows

users to query any worker and still get the status of all running plugins. Like the conﬁguration

and oﬀsets topics, the status topic is also compaction enabled.

In addition to these topics, Kafka Connect makes extensive use of Kafka’s group membership API.

The groups are named after the connector name. For example, for a connector named ﬁle-sink,

the group is named connect-ﬁle-sink. Each consumer in the group provides records to a single

task. These groups and their oﬀsets can be retrieved by using regular consumer groups tools, such

as Kafka-consumer-group.sh. For each sink connector, the Connect runtime runs a regular

consumer group that extracts records from Kafka.

State management of Amazon MSK Connect applications

By default, Amazon MSK Connect creates three separate topics in the Kafka cluster for each

Amazon MSK Connector to store the connector’s conﬁguration, oﬀset, and status. The default topic

names are structured as follows:

•

__msk_connect_conﬁgs_connector-name_connector-id

•

__msk_connect_status_connector-name_connector-id

•

__msk_connect_oﬀsets_connector-name_connector-id

Note

To provide the oﬀset continuity between source connectors, you can use an oﬀset storage

topic of your choice, instead of the default topic. Specifying an oﬀset storage topic helps

you accomplish tasks like creating a source connector that resumes reading from the last

oﬀset of a previous connector. To specify an oﬀset storage topic, supply a value for the

oﬀset.storage.topic property in the Amazon MSK Connect worker conﬁguration before

creating the connector.

State management 180

Amazon Managed Streaming for Apache Kafka Developer Guide

Migrate source connectors to Amazon MSK Connect

Source connectors are Apache Kafka Connect applications that import records from external

systems into Kafka. This section describes the process for migrating Apache Kafka Connect source

connector applications that are running on-premises or self-managed Kafka Connect clusters that

are running on AWS to Amazon MSK Connect.

The Kafka Connect source connector application stores oﬀsets in a topic that’s named with the

value that’s set for the conﬁg property offset.storage.topic. The following are the sample

oﬀset messages for a JDBC connector that’s running two tasks that import data from two diﬀerent

tables named movies and shows. The most recent row imported from the table movies has a

primary ID of 18343. The most recent row imported from the shows table has a primary ID of 732.

["jdbcsource",{"protocol":"1","table":"sample.movies"}] {"incrementing":18343}

["jdbcsource",{"protocol":"1","table":"sample.shows"}] {"incrementing":732}

To migrate source connectors to Amazon MSK Connect, do the following:

1. Create an Amazon MSK Connect custom plugin by pulling connector libraries from your on-

premises or self-managed Kafka Connect cluster.

Create Amazon MSK Connect worker properties and set the properties key.converter,

value.converter, and offset.storage.topic to the same values that are set for the

Kafka connector that’s running in your existing Kafka Connect cluster.

Pause the connector application on the existing cluster by making a PUT /

connectors/connector-name/pause request on the existing Kafka Connect cluster.

4. Make sure that all of the connector application’s tasks are completely stopped. You can stop

the tasks either by making a GET /connectors/connector-name/status request on the

existing Kafka Connect cluster or by consuming the messages from the topic name that’s set for

the property status.storage.topic.

5. Get the connector conﬁguration from the existing cluster. You can get the connector

conﬁguration either by making a GET /connectors/connector-name/config/ request

on the existing cluster or by consuming the messages from the topic name that’s set for the

property config.storage.topic.

6. Create a new Amazon MSK Connector with the same name as an existing cluster. Create

this connector by using the connector custom plugin that you created in step 1, the worker

properties that you created in step 2, and the connector conﬁguration that you extracted in step

Migrate source connectors 181

Amazon Managed Streaming for Apache Kafka Developer Guide

When the Amazon MSK Connector status is active, view the logs to verify that the connector

has started importing data from the source system.

Delete the connector in the existing cluster by making a DELETE /connectors/connector-

name request.

Migrate sink connectors to Amazon MSK Connect

Sink connectors are Apache Kafka Connect applications that export data from Kafka to external

systems. This section describes the process for migrating Apache Kafka Connect sink connector

applications that are running on-premises or self-managed Kafka Connect clusters that are running

on AWS to Amazon MSK Connect.

Kafka Connect sink connectors use the Kafka group membership API and store oﬀsets in the same

__consumer_offset topics as a typical consumer application. This behavior simpliﬁes migration

of the sink connector from a self-managed cluster to Amazon MSK Connect.

To migrate sink connectors to Amazon MSK Connect, do the following:

1. Create an Amazon MSK Connect custom plugin by pulling connector libraries from your on-

premises or self-managed Kafka Connect cluster.

Create Amazon MSK Connect worker properties and set the properties key.converter and

value.converter to the same values that are set for the Kafka connector that’s running in

your existing Kafka Connect cluster.

Pause the connector application on your existing cluster by making a PUT /

connectors/connector-name/pause request on the existing Kafka Connect cluster.

4. Make sure that all of the connector application’s tasks are completely stopped. You can stop

the tasks either by making a GET /connectors/connector-name/status request on the

existing Kafka Connect cluster, or by consuming the messages from the topic name that’s set for

the property status.storage.topic.

5. Get the connector conﬁguration from the existing cluster. You can get the connector

conﬁguration either by making a GET /connectors/connector-name/config request

on the existing cluster, or by consuming the messages from the topic name that’s set for the

property config.storage.topic.

6. Create a new Amazon MSK Connector with same name as the existing cluster. Create this

connector by using the connector custom plugin that you created in step 1, the worker

Migrate sink connectors 182

Amazon Managed Streaming for Apache Kafka Developer Guide

properties that you created in step 2, and the connector conﬁguration that you extracted in step

When the Amazon MSK Connector status is active, view the logs to verify that the connector

has started importing data from the source system.

Delete the connector in the existing cluster by making a DELETE /connectors/connector-

name request.

Troubleshoot issues in Amazon MSK Connect

The following information can help you troubleshoot problems that you might have while using

MSK Connect. You can also post your issue to the AWS re:Post.

Connector is unable to access resources hosted on the public internet

See Enabling internet access for Amazon MSK Connect.

Connector's number of running tasks is not equal to the number of tasks speciﬁed in tasks.max

Here are some reasons a connector may use fewer tasks than the speciﬁed tasks.max conﬁguration:

• Some connector implementations limit the number of tasks the can be used. For example, the

Debezium connector for MySQL is limited to using a single task.

• When using autoscaled capacity mode, Amazon MSK Connect overrides a connector's tasks.max

property with a value that is proportional to the number of workers running in the connector

and the number of MCUs per worker.

• For sink connectors, the level of parallelism (number of tasks) cannot be more than the number

of topic partitions. While you can set the tasks.max larger than that, a single partition is never

processed by more than a single task at a time.

•

In Kafka Connect 2.7.x, the default consumer partition assignor is RangeAssignor. The behavior

of this assignor is to give the ﬁrst partition of every topic to a single consumer, the second

partition of every topic to a single consumer, etc. This means that the maximum number of

active tasks for a sink connector using RangeAssignor is equal to the maximum number of

partitions in any single topic being consumed. If this doesn't work for your use case, you should

create a Worker Conﬁguration in which the consumer.partition.assignment.strategy

property is set to a more suitable consumer partition assignor. See Kafka 2.7 Interface

ConsumerPartitionAssignor: All Known Implementing Classes.

Troubleshooting 183

Amazon Managed Streaming for Apache Kafka Developer Guide

What is Amazon MSK Replicator?

Amazon MSK Replicator is an Amazon MSK feature that enables you to reliably replicate data

across Amazon MSK clusters in diﬀerent or the same AWS region(s). With MSK Replicator,

you can easily build regionally resilient streaming applications for increased availability and

business continuity. MSK Replicator provides automatic asynchronous replication across MSK

clusters, eliminating the need to write custom code, manage infrastructure, or setup cross-region

networking.

MSK Replicator automatically scales the underlying resources so that you can replicate data on-

demand without having to monitor or scale capacity. MSK Replicator also replicates the necessary

Kafka metadata including topic conﬁgurations, Access Control Lists (ACLs), and consumer group

oﬀsets. If an unexpected event occurs in a region, you can failover to the other AWS region and

seamlessly resume processing.

MSK Replicator supports both cross-region replication (CRR) and same-region replication (SRR). In

cross-region replication, the source and target MSK clusters are in diﬀerent AWS Regions. In same-

region replication, both the source and target MSK clusters are in the same AWS Region. You need

to create source and target MSK clusters before using them with MSK Replicator.

Note

MSK Replicator supports the following AWS Regions: US East (us-east-1, N. Virginia);

US East (us-east-2, Ohio); US West (us-west-2, Oregon); Europe (eu-west-1, Ireland);

Europe (eu-central-1, Frankfurt); Asia Paciﬁc (ap-southeast-1, Singapore); Asia Paciﬁc

(ap-southeast-2, Sydney), Europe (eu-north-1, Stockholm), Asia Paciﬁc (ap-south-1,

Mumbai), Europe (eu-west-3, Paris), South America (sa-east-1, São Paulo), Asia Paciﬁc (ap-

northeast-2, Seoul), Europe (eu-west-2, London), Asia Paciﬁc (ap-northeast-1, Tokyo), US

West (us-west-1, N. California), Canada (ca-central-1, Central).

Here are some common uses for Amazon MSK Replicator.

• Build multi-region streaming applications: Build highly available and fault-tolerant streaming

applications for increased resiliency without setting up custom solutions.

• Lower latency data access: Provide lower latency data access to consumers in diﬀerent

geographic regions.

184

Amazon Managed Streaming for Apache Kafka Developer Guide

• Distribute data to your partners: Copy data from one Apache Kafka cluster to many Apache

Kafka clusters, so that diﬀerent teams/partners have their own copies of data.

• Aggregate data for analytics: Copy data from multiple Apache Kafka clusters into one cluster for

easily generating insights on aggregated real-time data.

• Write locally, access your data globally: Set up multi-active replication to automatically

propagate writes performed in one AWS Region to other Regions for providing data at lower

latency and cost.

How Amazon MSK Replicator works

To get started with MSK Replicator, you need create a new Replicator in your target cluster’s AWS

Region. MSK Replicator automatically copies all data from the cluster in the primary AWS Region

called source to the cluster in the destination region called target. Source and target clusters can

be in the same or diﬀerent AWS Regions. You will need to create the target cluster if it does not

already exist.

When you create a Replicator, MSK Replicator deploys all required resources in the target cluster’s

AWS Region to optimize for data replication latency. Replication latency varies based on many

factors, including the network distance between the AWS Regions of your MSK clusters, the

throughput capacity of your source and target clusters, and the number of partitions on your

source and target clusters. MSK Replicator automatically scales the underlying resources so that

you can replicate data on-demand without having to monitor or scale capacity.

Data replication

By default, MSK Replicator copies all data asynchronously from the latest oﬀset in the source

cluster topic partitions to the target cluster. If the "Detect and copy new topics" setting is turned

on, MSK Replicator automatically detects and copies new topics or topic partitions to the target

cluster. However, it may take up to 30 seconds for the Replicator to detect and create the new

topics or topic partitions on the target cluster. Any messages produced to the source topic before

the topic has been created on the target cluster will not be replicated. Alternatively, you can

conﬁgure your Replicator during creation to start replication from the earliest oﬀset in the source

cluster topic partitions if you want to replicate existing messages on your topics to the target

cluster.

MSK Replicator does not store your data. Data is consumed from your source cluster, buﬀered

in-memory and written to the target cluster. The buﬀer is cleared automatically when the data

How Amazon MSK Replicator works 185

Amazon Managed Streaming for Apache Kafka Developer Guide

is either successfully written or fails after retries. All the communication and data between MSK

Replicator and your clusters are always encrypted in-transit. All MSK Replicator API calls like

DescribeClusterV2, CreateTopic, DescribeTopicDynamicConfiguration are captured in

AWS CloudTrail. Your MSK broker logs will also reﬂect the same.

MSK Replicator creates topics in the target cluster with a Replicator Factor of 3. If you need to, you

can modify the replication factor directly on the target cluster.

Metadata replication

MSK Replicator also supports copying the metadata from the source cluster to the target cluster.

The metadata includes topic conﬁguration, Access Control Lists (ACLs), and consumer groups

oﬀsets. Like data replication, metadata replication also happens asynchronously. For better

performance, MSK Replicator prioritizes data replication over metadata replication.

The following table is a list of Access Control Lists (ACLs) that MSK Replicator copies.

Operation Research APIs allowed

Alter Topic CreatePartitions

AlterConfigs Topic AlterConfigs

Create Topic CreateTopics, Metadata

Delete Topic DeleteRecords, DeleteTopics

Describe Topic ListOﬀsets, Metadata,

OﬀsetFetch, OﬀsetFor

LeaderEpoch

DescribeConﬁgs Topic DescribeConﬁgs

Read Topic Fetch, OﬀsetCommit,

TxnOﬀsetCommit

Write (deny only) Topic Produce, AddPartitionsToTxn

MSK Replicator copies LITERAL pattern type ACLs only for resource type Topic. PREFIXED pattern

type ACLs and other resource type ACLs are not copied. MSK Replicator also does not delete ACLs

Metadata replication 186

Amazon Managed Streaming for Apache Kafka Developer Guide

on the target cluster. If you delete an ACL on the source cluster, you should also delete on the

target cluster at the same time. For more details on Kafka ACLs resource, pattern and operations,

see https://kafka.apache.org/documentation/#security_authz_cli.

MSK Replicator replicates only Kafka ACLs, which IAM access control does not use. If your clients

are using IAM access control to read/write to your MSK clusters, you need to conﬁgure the relevant

IAM policies on your target cluster as well for seamless failover. This is also true for both Preﬁxed

as well as Identical topic name replication conﬁgurations.

As part of consumer groups oﬀsets syncing, MSK Replicator optimizes for your consumers on the

source cluster which are reading from a position closer to the tip of the stream (end of the topic

partition). If your consumer groups are lagging on the source cluster, you may see higher lag for

those consumer groups on the target as compared to the source. This means after failover to the

target cluster, your consumers will reprocess more duplicate messages. To reduce this lag, your

consumers on the source cluster would need to catch up and start consuming from the tip of the

stream (end of the topic partition). As your consumers catch up, MSK Replicator will automatically

reduce the lag.

Topic name conﬁguration

MSK Replicator has two topic name conﬁguration modes: Preﬁxed (default) or Identical topic name

replication.

Preﬁxed topic name replication

By default, MSK Replicator creates new topics in the target cluster with an auto-generated preﬁx

added to the source cluster topic name, such as <sourceKafkaClusterAlias>.topic. This is to

Topic name conﬁguration 187

Amazon Managed Streaming for Apache Kafka Developer Guide

distinguish the replicated topics from others in the target cluster and to avoid circular replication of

data between the clusters.

For example, MSK Replicator replicates data in a topic named “topic” from the source cluster to

a new topic in the target cluster called <sourceKafkaClusterAlias>.topic. You can ﬁnd the preﬁx

that will be added to the topic names in the target cluster under the sourceKafkaClusterAlias ﬁeld

using DescribeReplicator API or the Replicator details page on the MSK console. The preﬁx in

the target cluster is <sourceKafkaClusterAlias>.

To make sure your consumers can reliably restart processing from the standby cluster, you need to

conﬁgure your consumers to read data from the topics using a wildcard operator .*. For example,

your consumers would need to consume using .*topic1 in both AWS Regions. This example would

also include a topic such as footopic1, so adjust the wildcard operator according to your needs.

You should use the MSK Replicator which adds a preﬁx when you want to keep replicator data in a

separate topic in the target cluster, such as for active-active cluster setups.

Identical topic name replication

As an alternative to the default setting, Amazon MSK Replicator allows you to create a Replicator

with topic replication set to Identical topic name replication (Keep the same topics name in

console). You can create a new Replicator in the AWS Region which has your target MSK cluster.

Identically-named replicated topics let you avoid reconﬁguring clients to read from replicated

topics.

Identical topic name replication (Keep the same topics name in console) has the following

advantages:

• Allows you to retain identical topic names during the replication process, while also

automatically avoiding the risk of inﬁnite replication loops.

• Makes setting up and operating multi-cluster streaming architectures simpler, since you can

avoid reconﬁguring clients to read from the replicated topics.

• For active-passive cluster architectures, Identical topic name replication functionality also

streamlines the failover process, allowing applications to seamlessly failover to a standby cluster

without requiring any topic name changes or client reconﬁgurations.

• Can be used to more easily consolidate data from multiple MSK clusters into a single cluster for

data aggregation or centralized analytics. This requires you to create separate Replicators for

each source cluster and the same target cluster.

Topic name conﬁguration 188

Amazon Managed Streaming for Apache Kafka Developer Guide

• Can streamline data migration from one MSK cluster to another by replicating data to identically

named topics in the target cluster.

Amazon MSK Replicator uses Kafka headers to automatically avoid data being replicated back to

the topic it originated from, eliminating the risk of inﬁnite cycles during replication. A header is

a key-value pair that can be included with the key, value, and timestamp in each Kafka message.

MSK Replicator embeds identiﬁers for source cluster and topic into the header of each record being

replicated. MSK Replicator uses the header information to avoid inﬁnite replication loops. You

should verify that your clients are able to read replicated data as expected.

Tutorial: Set up source and target clusters for Amazon MSK

Replicator

This tutorial shows you how to set up a source cluster and a target cluster in the same AWS Region

or in diﬀerent AWS Regions. You then use those clusters to create an Amazon MSK Replicator.

Prepare the Amazon MSK source cluster

If you already have an MSK source cluster created for the MSK Replicator, make sure that it

meets the requirements described in this section. Otherwise, follow these steps to create an MSK

provisioned or serverless source cluster.

The process for creating a cross-region and same-region MSK Replicator source cluster are similar.

Diﬀerences are called out in the following procedures.

1. Create an MSK provisioned or serverless cluster with IAM access control turned on in the source

region. Your source cluster must have a minimum of three brokers.

2. For a cross-region MSK Replicator, if the source is a provisioned cluster, conﬁgure it with

multi-VPC private connectivity turned on for IAM access control schemes. Note that the

unauthenticated auth type is not supported when multi-VPC is turned on. You do not need

to turn on multi-VPC private connectivity for other authentication schemes (mTLS or SASL/

SCRAM). You can simultaneously use mTLS or SASL/SCRAM auth schemes for your other clients

connecting to your MSK cluster. You can conﬁgure multi-VPC private connectivity in the console

cluster details Network settings or with the UpdateConnectivity API. See Cluster owner

turns on multi-VPC. If your source cluster is an MSK Serverlesss cluster, you do not need to turn

on multi-VPC private connectivity.

Set up source and target clusters 189

Amazon Managed Streaming for Apache Kafka Developer Guide

For a same-region MSK Replicator, the MSK source cluster does not require multi-VPC private

connectivity and the cluster can still be accessed by other clients using the unauthenticated auth

type.

3. For cross-region MSK Replicators, you must attach a resource-based permissions policy to the

source cluster. This allows MSK to connect to this cluster for replicating data. You can do this

using the CLI or AWS Console procedures below. See also, Amazon MSK resource-based policies.

You do not need to perform this step for same-region MSK Replicators.

Console: create resource policy

Update the source cluster policy with the following JSON. Replace the placeholder with the ARN

of your source cluster.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": [

"kafka.amazonaws.com"

]

"Action": [

"kafka:CreateVpcConnection",

"kafka:GetBootstrapBrokers",

"kafka:DescribeClusterV2"

"Resource": "<sourceClusterARN>"

}

]

}

Use the Edit cluster policy option under the Actions menu on the cluster details page.

Prepare the Amazon MSK source cluster 190

Amazon Managed Streaming for Apache Kafka Developer Guide

CLI: create resource policy

Note: If you use the AWS console to create a source cluster and choose the option to create

a new IAM role, AWS attaches the required trust policy to the role. If you want MSK to use

an existing IAM role or if you create a role on your own, attach the following trust policies to

that role so that MSK Replicator can assume it. For information about how to modify the trust

relationship of a role, see Modifying a Role.

1. Get the current version of the MSK cluster policy using this command. Replace placeholders

with the actual cluster ARN.

aws kafka get-cluster-policy —cluster-arn <Cluster ARN>

{

"CurrentVersion": "K1PA6795UKM GR7",

"Policy": "..."

Prepare the Amazon MSK source cluster 191

Amazon Managed Streaming for Apache Kafka Developer Guide

}

2. Create a resource-based policy to allow MSK Replicator to access your source cluster. Use

the following syntax as a template, replacing the placeholder with the actual source cluster

ARN.

aws kafka put-cluster-policy --cluster-arn "<sourceClusterARN>" --policy '{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": [

"kafka.amazonaws.com"

]

"Action": [

"kafka:CreateVpcConnection",

"kafka:GetBootstrapBrokers",

"kafka:DescribeClusterV2"

"Resource": "<sourceClusterARN>"

}

]

Prepare the Amazon MSK target cluster

Create an MSK target cluster (provisioned or serverless) with IAM access control turned on. The

target cluster doesn’t require multi-VPC private connectivity turned on. The target cluster can be

in the same AWS Region or a diﬀerent Region as the source cluster. Both the source and target

clusters must be in the same AWS account. Your target cluster must have a minimum of three

brokers.

Tutorial: Create an Amazon MSK Replicator

After you set up the source and target clusters, you can use those clusters to create an Amazon

MSK Replicator. Before you create the Amazon MSK Replicator, make sure you have IAM

permissions required to create an MSK Replicator.

Topics

Prepare the Amazon MSK target cluster 192

Amazon Managed Streaming for Apache Kafka Developer Guide

• Considerations for creating an Amazon MSK Replicator

• IAM permissions required to create an MSK Replicator

• Supported cluster types and versions for MSK Replicator

• Supported MSK Serverless cluster conﬁguration

• Cluster conﬁguration changes

• Create replicator using the AWS console in the target cluster Region

• Choose your source cluster

• Choose your target cluster

• Conﬁgure replicator settings and permissions

Considerations for creating an Amazon MSK Replicator

The following sections give an overview of the prerequisites, supported conﬁgurations, and

best practices for using the MSK Replicator feature. It covers the necessary permissions, cluster

compatibility, and Serverless-speciﬁc requirements, as well as guidance on managing the

Replicator after creation.

IAM permissions required to create an MSK Replicator

Here is an example of the IAM policy required to create an MSK Replicator. The action

kafka:TagResource is only needed if tags are provided when creating the MSK Replicator.

Replicator IAM policies should be attached to the IAM role that corresponds to your client. For

information on creating authorization policies, see Create authorization policies.

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "VisualEditor1",

"Effect": "Allow",

"Action": [

"iam:PassRole",

"iam:CreateServiceLinkedRole",

"ec2:DescribeSubnets",

"ec2:DescribeSecurityGroups",

"ec2:CreateNetworkInterface",

"ec2:DescribeVpcs",

"kafka:CreateReplicator",

Considerations for creating an Amazon MSK Replicator 193

Amazon Managed Streaming for Apache Kafka Developer Guide

"kafka:TagResource"

"Resource": "*"

}

]

}

The following is an example IAM policy to describe replicator. Either the

kafka:DescribeReplicator action or kafka:ListTagsForResource action is needed, not

both.

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "VisualEditor1",

"Effect": "Allow",

"Action": [

"kafka:DescribeReplicator",

"kafka:ListTagsForResource"

"Resource": "*"

}

]

}

Supported cluster types and versions for MSK Replicator

These are requirements for supported instance types, Kafka versions, and network conﬁgurations.

• MSK Replicator supports both MSK provisioned clusters and MSK Serverless clusters in any

combination as source and target clusters. Other types of Kafka clusters are not supported at

this time by MSK Replicator.

• MSK Serverless clusters require IAM access control, don't support Apache Kafka ACL replication

and with limited support on-topic conﬁguration replication. See What is MSK Serverless?.

• MSK Replicator is supported only on clusters running Apache Kafka 2.7.0 or higher, regardless of

whether your source and target clusters are in the same or in diﬀerent AWS Regions.

• MSK Replicator supports clusters using instance types of m5.large or larger. t3.small clusters are

not supported.

Considerations for creating an Amazon MSK Replicator 194

Amazon Managed Streaming for Apache Kafka Developer Guide

• If you are using MSK Replicator with an MSK Provisioned cluster, you need a minimum of

three brokers in both source and target clusters. You can replicate data across clusters in two

Availability Zones, but you would need a minimum of four brokers in those clusters.

• Both your source and target MSK clusters must be in the same AWS account. Replication across

clusters in diﬀerent accounts is not supported.

• If the source and target MSK clusters are in diﬀerent AWS Regions (cross-region), MSK Replicator

requires the source cluster to have multi-VPC private connectivity turned on for its IAM Access

Control method. Multi-VPC is not required for other authentication methods on the source

cluster. Multi-VPC is not required if you are replicating data between clusters in the same AWS

Region. See the section called “Multi-VPC private connectivity in a single Region”.

• Identical topic name replication (Keep the same topics name in console) requires an MSK cluster

running Kafka version 2.8.1 or higher.

• For Identical topic name replication (Keep the same topics name in console) conﬁgurations,

to avoid the risk of cyclic replication, do not make changes to the headers that MSK Replicator

creates (__mskmr).

Supported MSK Serverless cluster conﬁguration

• MSK Serverless supports replication of these topic conﬁgurations for MSK Serverless target

clusters during topic creation: cleanup.policy, compression.type, max.message.bytes,

retention.bytes, retention.ms.

• MSK Serverless supports only these topic conﬁgurations during topic conﬁguration sync:

compression.type, max.message.bytes, retention.bytes, retention.ms.

• Replicator uses 83 compacted partitions on target MSK Serverless clusters. Make sure that target

MSK Serverless clusters have a suﬃcient number of compacted partitions. See MSK Serverless

quota.

Cluster conﬁguration changes

• It’s recommended that you do not turn tiered storage on or oﬀ after the MSK Replicator has

been created. If your target cluster is not tiered, then MSK won’t copy the tiered storage

conﬁgurations, regardless of whether your source cluster is tiered or not. If you turn on tiered

storage on the target cluster after Replicator is created, the Replicator needs to be recreated.

If you want to copy data from a non-tiered to a tiered cluster, you should not copy topic

conﬁgurations. See Enabling and disabling tiered storage on an existing topic.

Considerations for creating an Amazon MSK Replicator 195

Amazon Managed Streaming for Apache Kafka Developer Guide

• Don’t change cluster conﬁguration settings after MSK Replicator creation. Cluster conﬁguration

settings are validated during MSK Replicator creation. To avoid problems with the MSK

Replicator, don’t change the following settings after the MSK Replicator is created.

• Change MSK cluster to t3 instance type.

• Change service execution role permissions.

• Disable MSK multi-VPC private connectivity.

• Change the attached cluster resource-based policy.

• Change cluster security group rules.

Create replicator using the AWS console in the target cluster Region

The following section explains the step-wise console workﬂow for creating a replicator.

Replicator details

1. In the AWS Region where your target MSK cluster is located, open the Amazon MSK console at

https://console.aws.amazon.com/msk/home?region=us-east-1#/home/.

2. Choose Replicators to display the list of replicators in the account.

3. Choose Create replicator.

4. In the Replicator details pane, give the new replicator a unique name.

Choose your source cluster

The source cluster contains the data you want to copy to a target MSK cluster.

1. In the Source cluster pane, choose the AWS Region where the source cluster is located.

You can look up a cluster’s Region by going to MSK Clusters and looking at the Cluster details

ARN. The Region name is embedded in the ARN string. In the following example ARN, ap-

southeast-2 is the cluster region.

arn:aws:kafka:ap-southeast-2:123456789012:cluster/cluster-11/

eec93c7f-4e8b-4baf-89fb-95de01ee639c-s1

2. Enter the ARN of your source cluster or browse to choose your source cluster.

3. Choose subnet(s) for your source cluster.

Create replicator with AWS console 196

Amazon Managed Streaming for Apache Kafka Developer Guide

The console displays the subnets available in the source cluster’s Region for you to select. You

must select a minimum of two subnets. For a same-region MSK Replicator, the subnets that

you select set to access the source cluster and the subnets to access the target cluster must be

in the same Availability Zone.

4. Choose security group(s) for the MSK Replicator to access your source cluster.

• For cross-region replication (CRR), you do not need to provide security group(s) for your

source cluster.

• For same region replication (SRR), go to the Amazon EC2 console at https://

console.aws.amazon.com/ec2/ and ensure that the security groups you will provide for the

Replicator have outbound rules to allow traﬃc to your source cluster's security groups. Also,

ensure that your source cluster's security groups have inbound rules that allow traﬃc from

the Replicator security groups provided for the source.

To add inbound rules to your source cluster’s security group:

1. In the AWS console, go to your source cluster’s details by selecting the the Cluster

name.

2. Select the Properties tab, then scroll down to the Network settings pane to select the

name of the Security group applied.

3. Go to the inbound rules and select Edit inbound rules.

4. Select Add rule.

5. In the Type column for the new rule, select Custom TCP.

In the Port range column, type 9098. MSK Replicator uses IAM access control to connect

to your cluster which uses port 9098.

7. In the Source column, type the name of the security group that you will provide during

Replicator creation for the source cluster (this may be the same as the MSK source

cluster's security group), and then select Save rules.

To add outbound rules to Replicator’s security group provided for the source:

1. In the AWS console for Amazon EC2, go to the security group that you will provide

during Replicator creation for the source.

2. Go to the outbound rules and select Edit outbound rules.

Create replicator with AWS console 197

Amazon Managed Streaming for Apache Kafka Developer Guide

3. Select Add rule.

4. In the Type column for the new rule, select Custom TCP.

In the Port range column, type 9098. MSK Replicator uses IAM access control to connect

to your cluster which uses port 9098.

6. In the Source column, type the name of the MSK source cluster’s security group, and

then select Save rules.

Note

Alternately, if you do not want to restrict traﬃc using your security groups, you can add

inbound and outbound rules allowing All Traﬃc.

1. Select Add rule.

2. In the Type column, select All Traﬃc.

3. In the Source column, type 0.0.0.0/0, and then select Save rules.

Choose your target cluster

The target cluster is the MSK provisioned or serverless cluster to which the source data is copied.

Note

MSK Replicator creates new topics in the target cluster with an auto-generated

preﬁx added to the topic name. For instance, MSK Replicator replicates data

in “topic” from the source cluster to a new topic in the target cluster called

<sourceKafkaClusterAlias>.topic. This is to distinguish topics that contain data

replicated from source cluster from other topics in the target cluster and to avoid data

being circularly replicated between the clusters. You can ﬁnd the preﬁx that will be added

to the topic names in the target cluster under the sourceKafkaClusterAlias ﬁeld using

DescribeReplicator API or the Replicator details page on the MSK Console. The preﬁx

in the target cluster is <sourceKafkaClusterAlias>.

1. In the Target cluster pane, choose the AWS Region where the target cluster is located.

2. Enter the ARN of your target cluster or browse to choose your target cluster.

3. Choose subnet(s) for your target cluster.

Create replicator with AWS console 198

Amazon Managed Streaming for Apache Kafka Developer Guide

The console displays subnets available in the target cluster’s Region for you to select. Select a

minimum of two subnets.

4. Choose security group(s) for the MSK Replicator to access your target cluster.

The security groups available in the target cluster’s Region are displayed for you to select. The

chosen security group is associated with each connection. For more information about using

security groups, see the Control traﬃc to your AWS resources using security groups in the

Amazon VPC User Guide.

• For both cross region replication (CRR) and same region replication (SRR), go to the Amazon

EC2 console at https://console.aws.amazon.com/ec2/ and ensure that the security groups

you will provide to the Replicator have outbound rules to allow traﬃc to your target cluster's

security groups. Also ensure that your target cluster's security groups have inbound rules

that accept traﬃc from the Replicator security groups provided for the target.

To add inbound rules to your target cluster’s security group:

1. In the AWS console, go to your target cluster’s details by selecting the the Cluster name.

2. Select the Properties tab, then scroll down to the Network settings pane to select the

name of the Security group applied.

3. Go to the inbound rules and select Edit inbound rules.

4. Select Add rule.

5. In the Type column for the new rule, select Custom TCP.

In the Port range column, type 9098. MSK Replicator uses IAM access control to connect

to your cluster which uses port 9098.

7. In the Source column, type the name of the security group that you will provide during

Replicator creation for the target cluster (this may be the same as the MSK target cluster's

security group), and then select Save rules.

To add outbound rules to Replicator’s security group provided for the target:

1. In the AWS console, go to the security group that you will provide during Replicator

creation for the target.

2. Select the Properties tab, then scroll down to the Network settings pane to select the

name of the Security group applied.

Create replicator with AWS console 199

Amazon Managed Streaming for Apache Kafka Developer Guide

3. Go to the outbound rules and select Edit outbound rules.

4. Select Add rule.

5. In the Type column for the new rule, select Custom TCP.

In the Port range column, type 9098. MSK Replicator uses IAM access control to connect

to your cluster which uses port 9098.

7. In the Source column, type the name of the MSK target cluster’s security group, and then

select Save rules.

Note

Alternately, if you do not want to restrict traﬃc using your security groups, you can add

inbound and outbound rules allowing All Traﬃc.

1. Select Add rule.

2. In the Type column, select All Traﬃc.

3. In the Source column, type 0.0.0.0/0, and then select Save rules.

Conﬁgure replicator settings and permissions

1. In the Replicator settings pane, specify the topics you want to replicate using regular

expressions in the allow and deny lists. By default, all topics are replicated.

Note

MSK Replicator only replicates up to 750 topics in sorted order. If you need to replicate

more topics, we recommend that you create a separate Replicator. Go to the AWS

console Support Center and create a support case if you need support for more than

750 topics per Replicator. You can monitor the number of topics being replicated using

the "TopicCount" metric. See Amazon MSK quota.

2. By default, MSK Replicator starts replication from the latest (most recent) oﬀset in the selected

topics. Alternatively, you can start replication from the earliest (oldest) oﬀset in the selected

topics if you want to replicate existing data on your topics. Once the Replicator is created, you

can’t change this setting. This setting corresponds to the startingPosition ﬁeld in the

CreateReplicator request and DescribeReplicator response APIs.

3. Choose a topic name conﬁguration:

Create replicator with AWS console 200

Amazon Managed Streaming for Apache Kafka Developer Guide

•

PREFIXED topic name replication (Add preﬁx to topics name in console): The default

setting. MSK Replicator replicates “topic1” from the source cluster to a new topic in the

target cluster with the name <sourceKafkaClusterAlias>.topic1.

• Identical topic name replication (Keep the same topics name in console): Topics from the

source cluster are replicated with identical topic names in the target cluster.

This setting corresponds to the TopicNameConfiguration ﬁeld in the CreateReplicator

request and DescribeReplicator response APIs. See How Amazon MSK Replicator works.

Note

By default, MSK Replicator creates new topics in the target cluster with an auto-

generated preﬁx added to the topic name. This is to distinguish topics that contain

data replicated from source cluster from other topics in the target cluster and to avoid

data being circularly replicated between the clusters. Alternatively, you can create a

MSK Replicator with Identical topic name replication (Keep the same topics name

in console) so that topic names are preserved during replication. This conﬁguration

reduces the need for you to reconﬁgure client applications during setup and makes it

simpler to operate multi-cluster streaming architectures.

4. By default, MSK Replicator copies all metadata including topic conﬁgurations, Access Control

Lists (ACLs) and consumer group oﬀsets for seamless failover. If you are not creating the

Replicator for failover, you can optionally choose to turn oﬀ one or more of these settings

available in the Additional settings section.

Note

MSK Replicator does not replicate write ACLs since your producers should not be

writing directly to the replicated topic in the target cluster. Your producers should write

to the local topic in the target cluster after failover. See Perform a planned failover to

the secondary AWS Region for details.

5. In the Consumer group replication pane, specify the consumer groups you want to replicate

using regular expressions in the allow and deny lists. By default, all consumer groups are

replicated.

Create replicator with AWS console 201

Amazon Managed Streaming for Apache Kafka Developer Guide

6. In the Compression pane, you can optionally choose to compress the data written to the

target cluster. If you’re going to use compression, we recommend that you use the same

compression method as the data in your source cluster.

7. In the Access permissions pane do either of the following:

a. Select Create or update IAM role with required policies. MSK console will automatically

attach the necessary permissions and trust policy to the service execution role required to

read and write to your source and target MSK clusters.

b. Provide your own IAM role by selecting Choose from IAM roles that Amazon MSK can

assume. We recommend that you attach the AWSMSKReplicatorExecutionRole

managed IAM policy to your service execution role, instead of writing your own IAM policy.

• Create the IAM role that the Replicator will use to read and write to your source

and target MSK clusters with the below JSON as part of the trust policy and the

AWSMSKReplicatorExecutionRole attached to the role. In the trust policy, replace

the placeholder <yourAccountID> with your actual account ID.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": "kafka.amazonaws.com"

"Action": "sts:AssumeRole",

"Condition": {

"StringEquals": {

"aws:SourceAccount": "<yourAccountID>"

}

Create replicator with AWS console 202

Amazon Managed Streaming for Apache Kafka Developer Guide

}

]

}

8. In the Replicator tags pane, you can optionally assign tags to the MSK Replicator resource. For

more information, see Tag an Amazon MSK cluster. For a cross-region MSK Replicator, tags are

synced to the remote Region automatically when the Replicator is created. If you change tags

after the Replicator is created, the change is not automatically synced to the remote Region, so

you’ll need to sync local replicator and remote replicator references manually.

9. Select Create.

If you want to restrict kafka-cluster:WriteData permission, refer to the Create authorization

policies section of How IAM access control for Amazon MSK works. You'll need to add kafka-

cluster:WriteDataIdempotently permission to both the source and target cluster.

It takes approximately 30 minutes for the MSK Replicator to be successfully created and

transitioned to RUNNING status.

If you create a new MSK Replicator to replace one that you deleted, the new Replicator starts

replication from the latest oﬀset.

If your MSK Replicator has transitioned to a FAILED status, refer to the troubleshooting section

Troubleshooting MSK Replicator.

Edit MSK Replicator settings

You can’t change the source cluster, target cluster, Replicator starting position, or topic name

replication conﬁguration once the MSK Replicator has been created. You need to create a new

replicator to use Identical topic name replication conﬁguration. However, you can edit other

Replicator settings, such as topics and consumer groups to replicate.

1. Sign in to the AWS Management Console, and open the Amazon MSK console at https://

console.aws.amazon.com/msk/home?region=us-east-1#/home/.

2. In the left navigation pane, choose Replicators to display the list of Replicators in the account

and select the MSK Replicator you want to edit.

3. Choose the Properties tab.

Edit MSK Replicator settings 203

Amazon Managed Streaming for Apache Kafka Developer Guide

4. In the Replicator settings section, choose Edit replicator.

5. You can edit the MSK Replicator settings by changing any of these settings.

• Specify the topics you want to replicate using regular expressions in the allow and deny lists.

By default, MSK Replicator copies all metadata including topic conﬁgurations, Access Control

Lists (ACLs) and consumer group oﬀsets for seamless failover. If you are not creating the

Replicator for failover, you can optionally choose to turn oﬀ one or more of these settings

available in the Additional settings section.

Note

MSK Replicator does not replicate write ACLs since your producers should not be

writing directly to the replicated topic in the target cluster. Your producers should

write to the local topic in the target cluster after failover. See Perform a planned

failover to the secondary AWS Region for details.

• For Consumer group replication, you can specify the consumer groups you want to

replicate using regular expressions in the allow and deny lists. By default, all consumer

groups are replicated. If allow and deny lists are empty, consumer group replication is turned

oﬀ.

• Under Target compression type, you can choose whether to compress the data written to

the target cluster. If you’re going to use compression, we recommend that you use the same

compression method as the data in your source cluster.

6. Save your changes.

It takes approximately 30 minutes for the MSK Replicator to be successfully created and

transitioned to running state. If your MSK Replicator has transitioned to a FAILED status, refer

to the troubleshooting section ???.

Delete an MSK Replicator

You may need to delete a MSK Replicator if it fails to create (FAILED status). The source and target

clusters assigned to an MSK Replicator can’t be changed once the MSK Replicator is created. You

can delete an existing MSK Replicator and create a new one. If you create a new MSK Replicator to

replace the deleted one, the new Replicator starts replication from the latest oﬀset.

Delete an MSK Replicator 204

Amazon Managed Streaming for Apache Kafka Developer Guide

1. In the AWS Region where your source cluster is located, sign in to the AWS Management

Console, and open the Amazon MSK console at https://console.aws.amazon.com/msk/home?

region=us-east-1#/home/.

2. In the navigation pane, select Replicators.

3. From the list of MSK Replicators, select the one you want to delete and choose Delete.

Monitor replication

You can use https://console.aws.amazon.com/cloudwatch/ in the target cluster Region to view

metrics for ReplicationLatency, MessageLag, and ReplicatorThroughput at a topic and

aggregate level for each Amazon MSK Replicator. Metrics are visible under ReplicatorName in the

“AWS/Kafka” namespace. You can also see ReplicatorFailure, AuthError and ThrottleTime

metrics to check for issues.

The MSK console displays a subset of CloudWatch metrics for each MSK Replicator. From the

console Replicator list, select the name of a Replicator and select the Monitoring tab.

MSK Replicator metrics

The following metrics describes performance or connection metrics for the MSK Replicator.

AuthError metrics do not cover topic-level auth errors. To monitor your MSK Replicator’s topic-level

auth errors, monitor Replicator’s ReplicationLatency metrics and the source cluster’s topic-level

metrics, MessagesInPerSec. If a topic’s ReplicationLatency dropped to 0 but the topic still has data

being produced to it, it indicates that the Replicator has an Auth issue with the topic. Check that

the Replicator’s service execution IAM role has suﬃcient permission to access the topic.

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

Performan

Replicati

onLatency

Time it takes

records to

replicate from

the source to

Replicato

rName

Milliseco

nds

Partition Maximum

Monitor replication 205

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

target cluster;

duration

between record

produce time

at source

and replicate

d to target.

If Replicati

onLatency

increases, check

if clusters have

enough partition

s to support

replication.

High replicati

on latency can

occur when the

partition count

is too low for

high throughpu

Replicato

rName,

Topic

Milliseco

nds

Partition Maximum

MSK Replicator metrics 206

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

Replicato

rName

Count Partition SumPerforman

MessageLag Monitors the

sync between

the MSK

Replicato

r and the

source cluster.

MessageLag

indicates the

lag between

the messages

produced to the

source cluster

and messages

consumed by

the replicato

r. It is not the

lag between

the source and

target cluster.

Even if the

source cluster

is unavailable/

interrupted, the

replicator will

ﬁnish writing

the message it

has consumed

to the target

cluster. After

an outage,

Replicato

rName,

Topic

Count Partition Sum

MSK Replicator metrics 207

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

MessageLa

g shows an

increase

indicating the

number of

messages the

replicator is

behind the

source cluster

and this can be

monitored until

the number of

messages is 0,

showing that

the replicator

has caught up

with the source

cluster.

MSK Replicator metrics 208

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

Performan

Replicato

rBytesInPerSec

Average number

of bytes

processed by

the replicator

per second. Data

processed by

MSK Replicato

r consists of all

the data that

MSK Replicato

r receives which

includes the

data replicate

d to target

cluster and the

data ﬁltered by

MSK Replicato

r (only if your

Replicator is

conﬁgured with

Identical topic

name conﬁgura

tion) to prevent

the data being

copied back to

the same topic

it originated

from. If your

Replicator is

conﬁgured with

Replicato

rName

BytesPerS

econd

Replicato

rName

Sum

MSK Replicator metrics 209

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

"Preﬁxed" topic

name conﬁgura

tion, both

Replicato

rBytesInP

erSec and

Replicato

rThroughp

ut metrics

will have the

same value as

no data will be

ﬁltered by MSK

Replicator.

MSK Replicator metrics 210

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

Replicato

rName

BytesPerS

econd

Partition SumPerforman

Replicato

rThroughput

Average

number of

bytes replicate

d per second.

If Replicato

rThroughput

drops for a

topic, check

KafkaClus

terPingSu

ccessCount

and AuthError

metrics to

ensure the

Replicator can

communicate

with clusters,

then check

cluster metrics

to ensure the

cluster is not

down.

Replicato

rName,

Topic

BytesPerS

econd

Partition Sum 

MSK Replicator metrics 211

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

Debug AuthError The number

of connectio

ns with failed

authentication

per second. If

this metric is

above 0, you

can check if

the service

execution role

policy for the

replicator is

valid and make

sure there aren't

deny permissio

ns set for the

cluster permissio

ns. Based on

clusterAlias

dimension, you

can identify if

the source or

target cluster

is experiencing

auth errors.

Replicato

rName,

ClusterAl

ias

Count Worker Sum

MSK Replicator metrics 212

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

Debug ThrottleTime The average

time in ms a

request was

throttled by

brokers on the

cluster. Set

throttling to

avoid having the

MSK Replicato

r overwhelm

the cluster.

If this metric

is 0, replicati

onLatency

is not high,

and replicato

rThroughput

is as expected,

then throttlin

g is working

as expected. If

this metric is

above 0, you can

adjust throttling

accordingly.

Replicato

rName,

ClusterAl

ias

Milliseco

nds

Worker Maximum

Debug ReplicatorFailure Number of

failures that

the replicator is

experiencing.

Replicato

rName

Count  Sum

MSK Replicator metrics 213

Amazon Managed Streaming for Apache Kafka Developer Guide

Metric

type

Metric Description Dimension

Unit Raw

Metric

Granulari

Raw

Metric

Aggregati

Stat

Debug KafkaClus

terPingSu

ccessCount

Indicates the

health of the

replicator

connection

to the kafka

cluster. If this

value is 1, the

connection is

healthy. If the

value is 0 or

no datapoint,

the connectio

n is unhealthy

. If the value

is 0, you can

check network

or IAM permissio

n settings

for the Kafka

cluster. Based

on ClusterAl

ias dimension,

you can identify

whether this

metric is for

source or target

cluster.

Replicato

rName,

ClusterAl

ias

Count  Sum

MSK Replicator metrics 214

Amazon Managed Streaming for Apache Kafka Developer Guide

Use replication to increase the resiliency of a Kafka streaming

application across Regions

You can use MSK Replicator to set up active-active or active-passive cluster topologies to increase

resiliency of your Apache Kafka application across AWS Regions. In an active-active setup, both

MSK clusters are actively serving reads and writes. In an active-passive setup, only one MSK cluster

at a time is actively serving streaming data, while the other cluster is on standby.

Considerations for building multi-Region Apache Kafka applications

Your consumers must be able to reprocess duplicate messages without downstream impact. MSK

Replicator replicates data at-least-once which may result in duplicates in the standby cluster.

When you switch over to the secondary AWS Region, your consumers may process the same

data more than once. MSK Replicator prioritizes copying data over consumer oﬀsets for better

performance. After a failover, the consumer may start reading from earlier oﬀsets resulting in

duplicate processing.

Producers and consumers must also tolerate losing minimal data. Since MSK Replicator replicates

data asynchronously, when the primary AWS Region starts experiencing failures, there is no

guarantee that all data is replicated to the secondary Region. You can use the replication latency to

determine maximum data that was not copied into the secondary Region.

Using active-active versus active-passive cluster topology

An active-active cluster topology oﬀers near zero recovery time and the capability for your

streaming application to operate simultaneously in multiple AWS Regions. When a cluster in one

Region is impaired, applications connected to the cluster in the other Region continue processing

data.

Active-passive setups are suited to applications that can run in only one AWS Region at a time,

or when you need more control over the data processing order. Active-passive setups require

more recovery time than active-active setups, as you must start your entire active-passive setup,

including your producers and consumers, in the secondary Region to resume streaming data after a

failover.

Use replication to increase resiliency 215

Amazon Managed Streaming for Apache Kafka Developer Guide

Create an active-passive Kafka cluster setup with recommended topic

naming conﬁgurations

For an active-passive setup, we recommend you to operate a similar setup of producers, MSK

clusters, and consumers (with the same consumer group name) in two diﬀerent AWS Regions. It is

important that the two MSK clusters have identical read and write capacity to ensure reliable data

replication. You need to create a MSK Replicator to continuously copy data from the primary to the

standby cluster. You also need to conﬁgure your producers to write data into topics on a cluster in

the same AWS Region.

For an active-passive setup, create a new Replicator with Identical topic name replication (Keep

the same topics name in console) to start replicating data from your MSK cluster in the primary

region to your cluster in the secondary region. We recommend that you operate a duplicate set

of producers and consumers in the two AWS Regions, each connecting to the cluster in their own

region using its bootstrap string. This simpliﬁes the failover process since it won’t require changes

to the bootstrap string. To ensure that consumers read from near where they left oﬀ, consumers in

the source and target clusters should have the same consumer group ID.

If you use Identical topic name replication (Keep the same topics name in console) for your MSK

Replicator, it will replicate your topics with the same name as the corresponding source topics.

We recommend that you conﬁgure cluster level settings and permissions for your clients on the

target cluster. You do not need to conﬁgure topic level settings and literal read ACLs as MSK

Replicator automatically copies them if you have selected the option to copy access control lists.

See Metadata replication.

Failover to the secondary AWS Region

We recommend that you monitor replication latency in the secondary AWS Region using Amazon

CloudWatch. During a service event in the primary AWS Region, replication latency may suddenly

increase. If the latency keeps increasing, use the AWS Service Health Dashboard to check for service

events in the primary AWS Region. If there’s an event, you can failover to the secondary AWS

Region.

Perform a planned failover to the secondary AWS Region

You can conduct a planned failover to test the resiliency of your application against an unexpected

event in your primary AWS region which has your source MSK cluster. A planned failover should not

result in data loss.

Create an active-passive Kafka cluster 216

Amazon Managed Streaming for Apache Kafka Developer Guide

If you’re using Identical topic name replication conﬁguration, follow these steps:

1. Shutdown all producers and consumers connecting to your source cluster.

2. Create a new MSK Replicator to replicate data from your MSK cluster in the secondary Region

to your MSK cluster in the primary Region with Identical topic name replication (Keep the

same topics name in console). This is required to copy the data that you will be writing to the

secondary region back to the primary Region so that you can failback to the primary Region

after the unexpected event has ended.

3. Start producers and consumers connected to the target cluster in the secondary AWS Region.

If you’re using Preﬁxed topic name conﬁguration, follow these steps to failover:

1. Shutdown all producers and consumers connecting to your source cluster.

2. Create a new MSK Replicator to replicate data from your MSK cluster in the secondary Region

to your MSK cluster in the primary Region. This is required to copy the data that you will be

writing to the secondary region back to the primary Region so that you can failback to the

primary Region after the unexpected event has ended.

3. Start producers on target cluster in the secondary AWS Region.

4. Depending on your application’s message ordering requirements, follow the steps in one of

the following tabs.

No message ordering

If your application does not require message ordering, start consumers in the secondary

AWS Region that read from both the local (for example, topic) and replicated topics (for

example, <sourceKafkaClusterAlias>.topic) using a wildcard operator (for example,

.*topic).

Message ordering

If your application requires message ordering, start consumers only for the replicated

topics on target cluster (for example, <sourceKafkaClusterAlias>.topic) but not the

local topics (for example, topic).

5. Wait for all the consumers of replicated topics on the target MSK cluster to ﬁnish processing

all data, so that consumer lag is 0 and the number of records processed is also 0. Then, stop

consumers for the replicated topics on target cluster. At this point, all records that were

replicated from the source MSK cluster to target MSK cluster have been consumed.

Perform a planned failover 217

Amazon Managed Streaming for Apache Kafka Developer Guide

Start consumers for the local topics (for example, topic) on the target MSK cluster.

Perform an unplanned failover to the secondary AWS Region

You can conduct an unplanned failover when there is a service event in the primary AWS Region

which has your source MSK cluster and you want to temporarily redirect your traﬃc to the

secondary Region which has your target MSK cluster. An unplanned failover could result in some

data loss as MSK Replicator replicates data asynchronously. You can track the message lag using

the metrics in ???.

If you’re using Identical topic name replication conﬁguration (Keep the same topics name in

console), follow these steps:

1. Attempt to shut down all producers and consumers connecting to the source MSK cluster in

the primary Region. This operation might not succeed due to impairments in that region.

2. Start producers and consumers connecting to the target MSK cluster in the secondary AWS

Region to complete the failover. As MSK Replicator also replicates metadata including read

ACLs and consumer group oﬀsets, your producers and consumers will seamlessly resume

processing from near where they left oﬀ before failover.

If you’re using PREFIX topic name conﬁguration, follow these steps to failover:

1. Attempt to shut down all producers and consumers connecting to the source MSK cluster in

the primary Region. This operation might not succeed due to impairments in that region.

2. Start producers and consumers connecting to the target MSK cluster in the secondary AWS

Region to complete the failover. As MSK Replicator also replicates metadata including read

ACLs and consumer group oﬀsets, your producers and consumers will seamlessly resume

processing from near where they left oﬀ before failover.

3. Depending on your application’s message ordering requirements, follow the steps in one of

the following tabs.

No message ordering

If your application does not require message ordering, start consumers in the target AWS

Region that read from both the local (for example, topic) and replicated topics (for

example, <sourceKafkaClusterAlias>.topic) using a wildcard operator (for example,

.*topic).

Perform an unplanned failover 218

Amazon Managed Streaming for Apache Kafka Developer Guide

Message ordering

1. Start consumers only for the replicated topics on target cluster (for example,

<sourceKafkaClusterAlias>.topic) but not the local topics (for example, topic).

2. Wait for all the consumers of replicated topics on the target MSK cluster to ﬁnish

processing all data, so that oﬀset lag is 0 and the number of records processed is also

0. Then, stop consumers for the replicated topics on target cluster. At this point, all

records that were replicated from the source MSK cluster to target MSK cluster have

been consumed.

Start consumers for the local topics (for example, topic) on the target MSK cluster.

4. Once the service event has ended in the primary Region, create a new MSK Replicator to

replicate data from your MSK cluster in the secondary Region to your MSK cluster in the

primary Region with Replicator starting position set to earliest. This is required to copy the

data that you will be writing to the secondary Region back to the primary Region so that

you can failback to the primary Region after the service event has ended. If you don't set the

Replicator starting position to earliest, any data you produced to the cluster in the secondary

region during the service event in the primary region will not be copied back to the cluster in

the primary region.

Perform failback to the primary AWS Region

You can failback to the primary AWS region after the service event in that region has ended.

If you’re using Identical topic name replication conﬁguration, follow these steps:

1. Create a new MSK Replicator with your secondary cluster as source and primary cluster as

target, starting position set to earliest and Identical topic name replication (Keep the same

topics name in console).

This will start the process of copying all data written to the secondary cluster after failover

back to the primary region.

Monitor the MessageLag metric on the new replicator in Amazon CloudWatch until it reaches

0, which indicates all data has been replicated from secondary to primary.

3. After all data has been replicated, stop all producers connecting to the secondary cluster and

start producers connecting to the primary cluster.

Perform failback 219

Amazon Managed Streaming for Apache Kafka Developer Guide

Wait for MaxOffsetLag metric for your consumers connecting to secondary cluster to become

0 to ensure they have processed all the data. See Monitor consumer lags.

5. Once all data has been processed, stop consumers in the secondary region and start

consumers connecting to the primary cluster to complete the failback.

6. Delete the Replicator you created in the ﬁrst step that is replicating data from your secondary

cluster to primary.

7. Verify that your existing Replicator copying data from primary to secondary cluster has status

as “RUNNING” and ReplicatorThroughput metric in Amazon CloudWatch 0.

Note that when you create a new Replicator with starting position as Earliest for failback, it

starts reading all data in your secondary clusters’ topics. Depending on your data retention

settings, your topics may have data that came from your source cluster. While MSK Replicator

automatically ﬁlters those messages, you will still incur data processing and transfer charges

for all the data in your secondary cluster. You can track the total data processed by replicator

using ReplicatorBytesInPerSec. See MSK Replicator metrics.

If you’re using Preﬁxed topic name conﬁguration, follow these steps:

You should initiate failback steps only after replication from the cluster in the secondary Region to

the cluster in the primary Region has caught up and the MessageLag metric in Amazon CloudWatch

is close to 0. A planned failback should not result in any data loss.

1. Shut down all producers and consumers connecting to the MSK cluster in the secondary

Region.

2. For active-passive topology, delete the Replicator that is replicating data from cluster in the

secondary Region to primary Region. You do not need to delete the Replicator for active-active

topology.

3. Start producers connecting to the MSK cluster in the primary Region.

4. Depending on your application’s message ordering requirements, follow the steps in one of

the following tabs.

No message ordering

If your application does not require message ordering, start consumers in the primary

AWS Region that read from both the local (for example, topic) and replicated topics (for

example, <sourceKafkaClusterAlias>.topic) using a wildcard operator (for example,

.*topic). The consumers on local topics (e.g.: topic) will resume from the last oﬀset they

Perform failback 220

Amazon Managed Streaming for Apache Kafka Developer Guide

consumed before the failover. If there was any unprocessed data from before the failover, it

will get processed now. In the case of a planned failover, there should be no such record.

Message ordering

1. Start consumers only for the replicated topics on primary Region (for example,

<sourceKafkaClusterAlias>.topic) but not the local topics (for example, topic).

2. Wait for all the consumers of replicated topics on the cluster in the primary Region to

ﬁnish processing all data, so that oﬀset lag is 0 and the number of records processed is

also 0. Then, stop consumers for the replicated topics on cluster in the primary Region.

At this point, all records that were produced in the secondary Region after failover have

been consumed in the primary Region.

Start consumers for the local topics (for example, topic) on the cluster in the primary

Region.

5. Verify that the existing Replicator from cluster in primary to cluster in secondary Region is

in RUNNING state and working as expected using the ReplicatorThroughput and latency

metrics.

Create an active-active setup using MSK Replicator

If you want to create an active-active setup where both MSK clusters are actively serving reads and

writes, we recommend that you use an MSK Replicator with Preﬁxed topic name replication (Add

preﬁx to topics name in console). However, this will require you to reconﬁgure your consumers to

read the replicated topics.

Follow these steps to set up active-active topology between source MSK cluster A and target MSK

cluster B.

1. Create a MSK Replicator with MSK cluster A as source and MSK cluster B as target.

2. After the above MSK Replicator has been successfully created, create a Replicator with cluster B

as source and cluster A as target.

3. Create two sets of producers, each writing data at the same time into the local topic (for

example, “topic”) in the cluster in the same region as the producer.

4. Create two sets of consumers, each reading data using a wildcard subscription (such as

“.*topic”) from the MSK cluster in the same AWS Region as the consumer. This way your

consumers will automatically read data produced locally in the Region from the local topic

(for example, topic), as well as data replicated from other Region in topic with the preﬁx

Create an active-active setup 221

Amazon Managed Streaming for Apache Kafka Developer Guide

<sourceKafkaClusterAlias>.topic). These two sets of consumers should have diﬀerent

consumer group IDs so that consumer group oﬀsets are not overwritten when MSK Replicator

copies them to the other cluster.

If you want to avoid reconﬁguring your clients, instead of the Preﬁxed topic name replication (Add

preﬁx to topics name in console), you can create the MSK Replicators using Identical topic name

replication (Keep the same topics name in console) to create an active-active setup. However, you

will pay additional data processing and data transfer charges for each Replicator. This is because

each Replicator will need to process twice the usual amount of data, once for replication and again

to prevent inﬁnite loops. You can track the total amount of data processed by each replicator using

the ReplicatorBytesInPerSec metric. See Monitor replication. This metric includes the data

replicated to target cluster as well as the data ﬁltered by MSK Replicator to prevent the data being

coped back to the same topic it originated from.

Note

If you're using Identical topic name replication (Keep the same topics name in console)

to set up active-active topology, wait at least 30 seconds after deleting a topic before

re-creating a topic with the same name. This waiting period helps to prevent duplicated

messages being replicated back to the source cluster. Your consumers must be able to

reprocess duplicate messages without downstream impact. See Considerations for building

multi-Region Apache Kafka applications.

Migrate from one Amazon MSK cluster to another using MSK

Replicator

You can use Identical topic name replication for cluster migration, but your consumers must be

able to handle duplicate messages without downstream impact. This is because MSK Replicator

provides at-least-once replication, which can lead to duplicate messages in rare scenarios. If your

consumers meet this requirement, follow these steps.

1. Create a Replicator that replicates data from your old cluster to the new cluster with Replicator's

starting position set to Earliest and using Identical topic name replication (Keep the same topics

name in console).

Migrate from one Amazon MSK cluster to another 222

Amazon Managed Streaming for Apache Kafka Developer Guide

2. Conﬁgure cluster-level settings and permissions on the new cluster. You do not need to

conﬁgure topic-level settings and “literal” read ACLs, as MSK Replicator automatically copies

them.

Monitor the MessageLag metric in Amazon CloudWatch until it reaches 0 which indicates all

data has been replicated.

4. After all data has been replicated, stop producers from writing data to the old cluster.

5. Reconﬁgure those producers to connect to the new cluster and start them.

Monitor MaxOffsetLag metric for your consumers reading data from the old cluster until it

becomes 0, which indicates all existing data has been processed.

7. Stop consumers that are connecting to the old cluster.

8. Reconﬁgure consumers to connect to the new cluster and start them.

Migrate from self-managed MirrorMaker2 to MSK Replicator

To migrate from MirrorMaker (MM2) to MSK Replicator, follow these steps:

1. Stop the producer that is writing to your source Amazon MSK cluster.

2. Allow MM2 to replicate all the messages on your source clusters’ topics. You can monitor the

consumer lag for MM2 consumer on your source MSK cluster to determine when all data has

been replicated.

3. Create a new Replicator with starting position set to Latest and topic name conﬁguration set to

IDENTICAL (Same topic names replication in console).

4. Once your Replicator is in the RUNNING state, you can start the producers writing to the source

cluster again.

Troubleshoot MSK Replicator

The following information can help you troubleshoot problems that you might have with MSK

Replicator. See Troubleshoot your Amazon MSK cluster for problem solving information about

other Amazon MSK features. You can also post your issue to AWS re:Post.

MSK Replicator state goes from CREATING to FAILED

Here are some common causes for MSK Replicator creation failure.

Migrate from self-managed MirrorMaker2 to MSK Replicator 223

Amazon Managed Streaming for Apache Kafka Developer Guide

1. Verify that the security groups you provided for the Replicator creation in the Target cluster

section have outbound rules to allow traﬃc to your target cluster's security groups. Also verify

that your target cluster's security groups have inbound rules that accept traﬃc from the security

groups you provide for the Replicator creation in the Target cluster section. See Choose your

target cluster.

2. If you are creating Replicator for cross-region replication, verify that your source cluster has

multi-VPC connectivity turned on for IAM Access Control authentication method. See Amazon

MSK multi-VPC private connectivity in a single Region. Also verify that the cluster policy is setup

on the source cluster so that the MSK Replicator can connect to the source cluster. See Prepare

the Amazon MSK source cluster.

3. Verify that the IAM role that you provided during MSK Replicator creation has the permissions

required to read and write to your source and target clusters. Also, verify that the IAM role has

permissions to write to topics. See Conﬁgure replicator settings and permissions

4. Verify that your network ACLs are not blocking the connection between the MSK Replicator and

your source and target clusters.

5. It's possible that source or target clusters are not fully available when the MSK Replicator tried

to connect to them. This might be due to excessive load, disk usage or CPU usage, which causes

the Replicator to be unable to connect to the brokers. Fix the issue with the brokers and retry

Replicator creation.

After you have performed the validations above, create the MSK Replicator again.

MSK Replicator appears stuck in the CREATING state

Sometimes MSK Replicator creation can take up to 30 minutes. Wait for 30 minutes and check the

state of the Replicator again.

MSK Replicator is not replicating data or replicating only partial data

Follow these steps to troubleshoot data replication problems.

1. Verify that your Replicator is not running into any authentication errors using the AuthError

metric provided by MSK Replicator in Amazon CloudWatch. If this metric is above 0, check if the

policy of the IAM role you provided for the replicator is valid and there aren't deny permissions

set for the cluster permissions. Based on clusterAlias dimension, you can identify if the source or

target cluster is experiencing authentication errors.

MSK Replicator appears stuck in the CREATING state 224

Amazon Managed Streaming for Apache Kafka Developer Guide

2. Verify that your source and target clusters are not experiencing any issues. It is possible that the

Replicator is not able to connect to your source or target cluster. This might happen due to too

many connections, disk at full capacity or high CPU usage.

3. Verify that your source and target clusters are reachable from MSK Replicator using the

KafkaClusterPingSuccessCount metric in Amazon CloudWatch. Based on clusterAlias dimension,

you can identify if the source or target cluster is experiencing auth errors. If this metric is 0

or has no datapoint, the connection is unhealthy. You should check network and IAM role

permissions that MSK Replicator is using to connect to your clusters.

4. Verify that your Replicator is not running into failures due to missing topic-level permissions

using the ReplicatorFailure metric in Amazon CloudWatch. If this metric is above 0, check the

IAM role you provided for topic-level permissions.

5. Verify that the regular expression you provided in the allow list while creating the Replicator

matches the names of the topics you want to replicate. Also, verify that the topics are not being

excluded from replication due to a regular expression in the deny list.

6. Note that it may take up to 30 seconds for the Replicator to detect and create the new topics

or topic partitions on the target cluster. Any messages produced to the source topic before the

topic has been created on the target cluster will not be replicated if replicator starting position

is latest (default). Alternatively, you can start replication from the earliest oﬀset in the source

cluster topic partitions if you want to replicate existing messages on your topics on the target

cluster. See Conﬁgure replicator settings and permissions.

Message oﬀsets in the target cluster are diﬀerent than the source

cluster

As part of replicating data, MSK Replicator consumes messages from the source cluster and

produces them to the target cluster. This can lead to messages having diﬀerent oﬀsets on your

source and target clusters. However, if you have turned on consumer groups oﬀsets syncing during

Replicator creation, MSK Replicator will automatically translate the oﬀsets while copying the

metadata so that after failing over to the target cluster, your consumers can resume processing

from near where they left oﬀ in the source cluster.

MSK Replicator is not syncing consumer groups oﬀsets or consumer

group does not exist on target cluster

Follow these steps to troubleshoot metadata replication problems.

Message oﬀsets in the target cluster are diﬀerent than the source cluster 225

Amazon Managed Streaming for Apache Kafka Developer Guide

1. Verify that your data replication is working as expected. If not, see MSK Replicator is not

replicating data or replicating only partial data.

2. Verify that the regular expression you provided in the allow list while creating the Replicator

matches the names of the consumer groups you want to replicate. Also, verify that the

consumer groups are not being excluded from replication due to a regular expression in the deny

list.

3. Verify that MSK Replicator has created the topic on the target cluster. It may take up to 30

seconds for the Replicator to detect and create the new topics or topic partitions on the target

cluster. Any messages produced to the source topic before the topic has been created on the

target cluster will not be replicated if the replicator starting position is latest (default). If your

consumer group on the source cluster has only consumed the mesages that have not been

replicated by MSK Replicator, the consumer group will not be replicated to the target cluster.

After the topic is successfuly created on the target cluster, MSK Replicator will start replicating

newly written messages on the source cluster to the target. Once your consumer group starts

reading these messages from the source, MSK Replicator will automatically replicate the

consumer group to the target cluster. Alternatively, you can start replication from the earliest

oﬀset in the source cluster topic partitions if you want to replicate existing messages on your

topics on the target cluster. See Conﬁgure replicator settings and permissions.

Note

MSK Replicator optimizes consumer groups oﬀset syncing for your consumers on the

source cluster which are reading from a position closer to the end of the topic partition. If

your consumer groups are lagging on the source cluster, you may see higher lag for those

consumer groups on the target as compared to the source. This means after failover to the

target cluster, your consumers will reprocess more duplicate messages. To reduce this lag,

your consumers on the source cluster would need to catch up and start consuming from the

tip of the stream (end of the topic partition). As your consumers catch up, MSK Replicator

will automatically reduce the lag.

Replication latency is high or keeps increasing

Here are some common causes for high replication latency.

Replication latency is high or keeps increasing 226

Amazon Managed Streaming for Apache Kafka Developer Guide

1. Verify that you have the right number of partitions on your source and target MSK clusters.

Having too few or too many partitions can impact performance. For guidance on choosing the

number of partitions, see Best practices for using MSK Replicator. The following table shows the

recommended minimum number of partitions for getting the throughput you want with MSK

Replicator.

Throughput and recommended minimum number of partitions

Throughput (MB/s) Minimum number of partitions required

50 167

100 334

250 833

500 1666

1000 3333

2. Verify that you have enough read and write capacity in your source and target MSK clusters

to support the replication traﬃc. MSK Replicator acts as a consumer for your source cluster

(egress) and as a producer for your target cluster (ingress). Therefore, you should provision

cluster capacity to support the replication traﬃc in addition to other traﬃc on your clusters. See

??? for guidance on sizing your MSK clusters.

3. Replication latency might vary for MSK clusters in diﬀerent source and destination AWS Region

pairs, depending on how geographically far apart the clusters are from each other. For example,

Replication latency is typically lower when replicating between clusters in the Europe (Ireland)

and Europe (London) Regions compared to replication between clusters in the Europe (Ireland)

and Asia Paciﬁc (Sydney) Regions.

4. Verify that your Replicator is not getting throttled due to overly aggressive quotas set on your

source or target clusters. You can use the ThrottleTime metric provided by MSK Replicator in

Amazon CloudWatch to see the average time in milliseconds a request was throttled by brokers

on your source/target cluster. If this metric is above 0, you should adjust Kafka quotas to reduce

throttling so that Replicator can catch-up. See Managing MSK Replicator throughput using Kafka

quotas for information on managing Kafka quotas for the Replicator.

5. ReplicationLatency and MessageLag might increase when an AWS Region becomes degraded.

Use the AWS Service Health Dashboard to check for an MSK service event in the Region where

Replication latency is high or keeps increasing 227

Amazon Managed Streaming for Apache Kafka Developer Guide

your primary MSK cluster is located. If there's a service event, you can temporarily redirect your

application reads and writes to the other Region.

Best practices for using MSK Replicator

This section covers common best practices and implementation strategies for using Amazon MSK

Replicator.

Topics

• Managing MSK Replicator throughput using Kafka quotas

• Setting cluster retention period

Managing MSK Replicator throughput using Kafka quotas

Since MSK Replicator acts as a consumer for your source cluster, replication can cause other

consumers to be throttled on your source cluster. The amount of throttling depends on the

read capacity you have on your source cluster and the throughput of data you’re replicating. We

recommend that your provision identical capacity for your source and target clusters, and account

for the replication throughput when calculating how much capacity you need.

You can also set Kafka quotas for the Replicator on your source and target clusters to control

how much capacity the MSK Replicator can use. A network bandwidth quota is recommended. A

network bandwidth quota deﬁnes a byte rate threshold, deﬁned as bytes per second, for one or

more clients sharing a quota. This quota is deﬁned on a per-broker basis.

Follow these steps to apply a quota.

1. Retrieve the bootstrap server string for the source cluster. See Get the bootstrap brokers for an

Amazon MSK cluster.

2. Retrieve the service execution role (SER) used by the MSK Replicator. This is the SER you used for

a CreateReplicator request. You can also pull the SER from the DescribeReplicator response

from an existing Replicator.

3. Using Kafka CLI tools, run the following command against the source cluster.

./kafka-configs.sh --bootstrap-server <source-cluster-bootstrap-server> --alter --

add-config 'consumer_byte_

Best practices for using MSK Replicator 228

Amazon Managed Streaming for Apache Kafka Developer Guide

rate=<quota_in_bytes_per_second>' --entity-type users --entity-name

arn:aws:sts::<customer-account-id>:assumed-role/<ser-role-name>/<customer-account-

id> --command-config <client-properties-for-iam-auth></programlisting>

After executing the above command, verify that the ReplicatorThroughput metric does not

cross the quota you have set.

Note that if you re-use a service execution role between multiple MSK Replicators they are all

subject to this quota. If you want to maintain separate quotas per Replicator, use separate service

execution roles.

For more information on using MSK IAM authentication with quotas, see Multi-tenancy Apache

Kafka clusters in Amazon MSK with IAM access control and Kafka Quotas – Part 1.

Warning

Setting an extremely low consumer_byte_rate may cause your MSK Replicator to act in

unexpected ways.

Setting cluster retention period

You can set the log retention period for MSK provisioned and serverless clusters. The

recommended retention period is 7 days. See Cluster conﬁguration changes or Supported MSK

Serverless cluster conﬁguration.

Setting cluster retention period 229

Amazon Managed Streaming for Apache Kafka Developer Guide

Understand cluster states

The following table shows the possible states of a cluster and describes what they mean. It also

describes what actions you can and cannot perform when a cluster is in one of these states. To

ﬁnd out the state of a cluster, you can visit the AWS Management Console. You can also use the

describe-cluster-v2 command or the DescribeClusterV2 operation to describe the cluster. The

description of a cluster includes its state.

Cluster state Meaning and possible actions

ACTIVE You can produce and consume data. You can

also perform Amazon MSK API and AWS CLI

operations on the cluster.

CREATING Amazon MSK is setting up the cluster. You

must wait for the cluster to reach the ACTIVE

state before you can use it to produce or

consume data or to perform Amazon MSK API

or AWS CLI operations on it.

DELETING The cluster is being deleted. You cannot

use it to produce or consume data. You also

cannot perform Amazon MSK API or AWS CLI

operations on it.

FAILED The cluster creation or deletion process failed.

You cannot use the cluster to produce or

consume data. You can delete the cluster but

cannot perform Amazon MSK API or AWS CLI

update operations on it.

HEALING Amazon MSK is running an internal operation

, like replacing an unhealthy broker. For

example, the broker might be unresponsive.

You can still use the cluster to produce and

consume data. However, you cannot perform

Amazon MSK API or AWS CLI update operation

230

Amazon Managed Streaming for Apache Kafka Developer Guide

Cluster state Meaning and possible actions

s on the cluster until it returns to the ACTIVE

state.

MAINTENANCE Amazon MSK is performing routine maintenan

ce operations on the cluster. Such maintenan

ce operations include security patching.

You can still use the cluster to produce and

consume data. However, you cannot perform

Amazon MSK API or AWS CLI update operation

s on the cluster until it returns to the ACTIVE

state.

REBOOTING_BROKER Amazon MSK is rebooting a broker. You can

still use the cluster to produce and consume

data. However, you cannot perform Amazon

MSK API or AWS CLI update operations on the

cluster until it returns to the ACTIVE state.

UPDATING A user-initiated Amazon MSK API or AWS CLI

operation is updating the cluster. You can still

use the cluster to produce and consume data.

However, you cannot perform any additional

Amazon MSK API or AWS CLI update operation

s on the cluster until it returns to the ACTIVE

state.

231

Amazon Managed Streaming for Apache Kafka Developer Guide

Security in Amazon Managed Streaming for Apache

Kafka

Cloud security at AWS is the highest priority. As an AWS customer, you beneﬁt from a data center

and network architecture that is built to meet the requirements of the most security-sensitive

organizations.

Security is a shared responsibility between AWS and you. The shared responsibility model describes

this as security of the cloud and security in the cloud:

• Security of the cloud – AWS is responsible for protecting the infrastructure that runs AWS

services in the AWS Cloud. AWS also provides you with services that you can use securely. Third-

party auditors regularly test and verify the eﬀectiveness of our security as part of the AWS

Compliance Programs. To learn about the compliance programs that apply to Amazon Managed

Streaming for Apache Kafka, see Amazon Web Services in Scope by Compliance Program.

• Security in the cloud – Your responsibility is determined by the AWS service that you use. You

are also responsible for other factors including the sensitivity of your data, your company's

requirements, and applicable laws and regulations.

This documentation helps you understand how to apply the shared responsibility model when

using Amazon MSK. The following topics show you how to conﬁgure Amazon MSK to meet your

security and compliance objectives. You also learn how to use other Amazon Web Services that

help you to monitor and secure your Amazon MSK resources.

Topics

• Data protection in Amazon Managed Streaming for Apache Kafka

• Authentication and authorization for Amazon MSK APIs

• Authentication and authorization for Apache Kafka APIs

• Changing an Amazon MSK cluster's security group

• Control access to Apache ZooKeeper nodes in your Amazon MSK cluster

• Amazon MSK logging

• Compliance validation for Amazon Managed Streaming for Apache Kafka

• Resilience in Amazon Managed Streaming for Apache Kafka

• Infrastructure security in Amazon Managed Streaming for Apache Kafka

232

Amazon Managed Streaming for Apache Kafka Developer Guide

Data protection in Amazon Managed Streaming for Apache

Kafka

The AWS shared responsibility model applies to data protection in Amazon Managed Streaming

for Apache Kafka. As described in this model, AWS is responsible for protecting the global

infrastructure that runs all of the AWS Cloud. You are responsible for maintaining control over your

content that is hosted on this infrastructure. You are also responsible for the security conﬁguration

and management tasks for the AWS services that you use. For more information about data

privacy, see the Data Privacy FAQ. For information about data protection in Europe, see the AWS

Shared Responsibility Model and GDPR blog post on the AWS Security Blog.

For data protection purposes, we recommend that you protect AWS account credentials and set

up individual users with AWS IAM Identity Center or AWS Identity and Access Management (IAM).

That way, each user is given only the permissions necessary to fulﬁll their job duties. We also

recommend that you secure your data in the following ways:

• Use multi-factor authentication (MFA) with each account.

• Use SSL/TLS to communicate with AWS resources. We require TLS 1.2 and recommend TLS 1.3.

• Set up API and user activity logging with AWS CloudTrail. For information about using CloudTrail

trails to capture AWS activities, see Working with CloudTrail trails in the AWS CloudTrail User

Guide.

• Use AWS encryption solutions, along with all default security controls within AWS services.

• Use advanced managed security services such as Amazon Macie, which assists in discovering and

securing sensitive data that is stored in Amazon S3.

• If you require FIPS 140-3 validated cryptographic modules when accessing AWS through a

command line interface or an API, use a FIPS endpoint. For more information about the available

FIPS endpoints, see Federal Information Processing Standard (FIPS) 140-3.

We strongly recommend that you never put conﬁdential or sensitive information, such as your

customers' email addresses, into tags or free-form text ﬁelds such as a Name ﬁeld. This includes

when you work with Amazon MSK or other AWS services using the console, API, AWS CLI, or AWS

SDKs. Any data that you enter into tags or free-form text ﬁelds used for names may be used for

billing or diagnostic logs. If you provide a URL to an external server, we strongly recommend that

you do not include credentials information in the URL to validate your request to that server.

Topics

Data protection 233

Amazon Managed Streaming for Apache Kafka Developer Guide

• Amazon MSK encryption

• Get started with Amazon MSK encryption

Amazon MSK encryption

Amazon MSK provides data encryption options that you can use to meet strict data management

requirements. The certiﬁcates that Amazon MSK uses for encryption must be renewed every 13

months. Amazon MSK automatically renews these certiﬁcates for all clusters. It sets the state of the

cluster to MAINTENANCE when it starts the certiﬁcate-update operation. It sets it back to ACTIVE

when the update is done. While a cluster is in the MAINTENANCE state, you can continue to produce

and consume data, but you can't perform any update operations on it.

Amazon MSK encryption at rest

Amazon MSK integrates with AWS Key Management Service (KMS) to oﬀer transparent server-

side encryption. Amazon MSK always encrypts your data at rest. When you create an MSK cluster,

you can specify the AWS KMS key that you want Amazon MSK to use to encrypt your data at rest.

If you don't specify a KMS key, Amazon MSK creates an AWS managed key for you and uses it on

your behalf. For more information about KMS keys, see AWS KMS keys in the AWS Key Management

Service Developer Guide.

Amazon MSK encryption in transit

Amazon MSK uses TLS 1.2. By default, it encrypts data in transit between the brokers of your MSK

cluster. You can override this default at the time you create the cluster.

For communication between clients and brokers, you must specify one of the following three

settings:

• Only allow TLS encrypted data. This is the default setting.

• Allow both plaintext, as well as TLS encrypted data.

• Only allow plaintext data.

Amazon MSK brokers use public AWS Certiﬁcate Manager certiﬁcates. Therefore, any truststore

that trusts Amazon Trust Services also trusts the certiﬁcates of Amazon MSK brokers.

Amazon MSK encryption 234

Amazon Managed Streaming for Apache Kafka Developer Guide

While we highly recommend enabling in-transit encryption, it can add additional CPU overhead

and a few milliseconds of latency. Most use cases aren't sensitive to these diﬀerences, however, and

the magnitude of impact depends on the conﬁguration of your cluster, clients, and usage proﬁle.

Get started with Amazon MSK encryption

When creating an MSK cluster, you can specify encryption settings in JSON format. The following is

an example.

{

"EncryptionAtRest": {

"DataVolumeKMSKeyId": "arn:aws:kms:us-east-1:123456789012:key/abcdabcd-1234-

abcd-1234-abcd123e8e8e"

"EncryptionInTransit": {

"InCluster": true,

"ClientBroker": "TLS"

}

For DataVolumeKMSKeyId, you can specify a customer managed key or the AWS managed key for

MSK in your account (alias/aws/kafka). If you don't specify EncryptionAtRest, Amazon MSK

still encrypts your data at rest under the AWS managed key. To determine which key your cluster is

using, send a GET request or invoke the DescribeCluster API operation.

For EncryptionInTransit, the default value of InCluster is true, but you can set it to false if

you don't want Amazon MSK to encrypt your data as it passes between brokers.

To specify the encryption mode for data in transit between clients and brokers, set ClientBroker

to one of three values: TLS, TLS_PLAINTEXT, or PLAINTEXT.

Topics

• Specify encryption settings when creating a Amazon MSK cluster

• Test Amazon MSK TLS encryption

Specify encryption settings when creating a Amazon MSK cluster

This process describes how to specify encryption settings when creating a Amazon MSK cluster.

Get started with Amazon MSK encryption 235

Amazon Managed Streaming for Apache Kafka Developer Guide

Specify encryption settings when creating a cluster

1. Save the contents of the previous example in a ﬁle and give the ﬁle any name that you want.

For example, call it encryption-settings.json.

Run the create-cluster command and use the encryption-info option to point to the

ﬁle where you saved your conﬁguration JSON. The following is an example. Replace {YOUR

MSK VERSION} with a version that matches the Apache Kafka client version. For information

on how to ﬁnd your MSK cluster version, see To ﬁnd the version of your MSK cluster. Be aware

that using an Apache Kafka client version that is not the same as your MSK cluster version may

lead to Apache Kafka data corruption, loss and down time.

aws kafka create-cluster --cluster-name "ExampleClusterName" --broker-node-group-

info file://brokernodegroupinfo.json --encryption-info file://encryptioninfo.json

--kafka-version "{YOUR MSK VERSION}" --number-of-broker-nodes 3

The following is an example of a successful response after running this command.

{

"ClusterArn": "arn:aws:kafka:us-east-1:123456789012:cluster/SecondTLSTest/

abcdabcd-1234-abcd-1234-abcd123e8e8e",

"ClusterName": "ExampleClusterName",

"State": "CREATING"

}

Test Amazon MSK TLS encryption

This process describes how to test TLS encryption on Amazon MSK.

To test TLS encryption

1. Create a client machine following the guidance in the section called “Create a client machine”.

2. Install Apache Kafka on the client machine.

3. In this example we use the JVM truststore to talk to the MSK cluster. To do this, ﬁrst create

a folder named /tmp on the client machine. Then, go to the bin folder of the Apache Kafka

installation, and run the following command. (Your JVM path might be diﬀerent.)

cp /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.201.b09-0.amzn2.x86_64/jre/lib/security/

cacerts /tmp/kafka.client.truststore.jks

Get started with Amazon MSK encryption 236

Amazon Managed Streaming for Apache Kafka Developer Guide

While still in the bin folder of the Apache Kafka installation on the client machine, create a

text ﬁle named client.properties with the following contents.

security.protocol=SSL

ssl.truststore.location=/tmp/kafka.client.truststore.jks

5. Run the following command on a machine that has the AWS CLI installed, replacing

clusterARN with the ARN of your cluster.

aws kafka get-bootstrap-brokers --cluster-arn clusterARN

A successful result looks like the following. Save this result because you need it for the next

step.

{

"BootstrapBrokerStringTls": "a-1.example.g7oein.c2.kafka.us-

east-1.amazonaws.com:0123,a-3.example.g7oein.c2.kafka.us-

east-1.amazonaws.com:0123,a-2.example.g7oein.c2.kafka.us-east-1.amazonaws.com:0123"

}

Run the following command, replacing BootstrapBrokerStringTls with one of the broker

endpoints that you obtained in the previous step.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-

list BootstrapBrokerStringTls --producer.config client.properties --topic

TLSTestTopic

7. Open a new command window and connect to the same client machine. Then, run the

following command to create a console consumer.

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-

server BootstrapBrokerStringTls --consumer.config client.properties --topic

TLSTestTopic

8. In the producer window, type a text message followed by a return, and look for the same

message in the consumer window. Amazon MSK encrypted this message in transit.

For more information about conﬁguring Apache Kafka clients to work with encrypted data, see

Conﬁguring Kafka Clients.

Get started with Amazon MSK encryption 237

Amazon Managed Streaming for Apache Kafka Developer Guide

Authentication and authorization for Amazon MSK APIs

AWS Identity and Access Management (IAM) is an AWS service that helps an administrator securely

control access to AWS resources. IAM administrators control who can be authenticated (signed in)

and authorized (have permissions) to use Amazon MSK resources. IAM is an AWS service that you

can use with no additional charge.

Topics

• How Amazon MSK works with IAM

• Amazon MSK identity-based policy examples

• Service-linked roles for Amazon MSK

• AWS managed policies for Amazon MSK

• Troubleshoot Amazon MSK identity and access

How Amazon MSK works with IAM

Before you use IAM to manage access to Amazon MSK, you should understand what IAM features

are available to use with Amazon MSK. To get a high-level view of how Amazon MSK and other

AWS services work with IAM, see AWS Services That Work with IAM in the IAM User Guide.

Topics

• Amazon MSK identity-based policies

• Amazon MSK resource-based policies

• Authorization based on Amazon MSK tags

• Amazon MSK IAM roles

Amazon MSK identity-based policies

With IAM identity-based policies, you can specify allowed or denied actions and resources as

well as the conditions under which actions are allowed or denied. Amazon MSK supports speciﬁc

actions, resources, and condition keys. To learn about all of the elements that you use in a JSON

policy, see IAM JSON Policy Elements Reference in the IAM User Guide.

Authentication and authorization for Amazon MSK APIs 238

Amazon Managed Streaming for Apache Kafka Developer Guide

Actions for Amazon MSK identity-based policies

Administrators can use AWS JSON policies to specify who has access to what. That is, which

principal can perform actions on what resources, and under what conditions.

The Action element of a JSON policy describes the actions that you can use to allow or deny

access in a policy. Policy actions usually have the same name as the associated AWS API operation.

There are some exceptions, such as permission-only actions that don't have a matching API

operation. There are also some operations that require multiple actions in a policy. These

additional actions are called dependent actions.

Include actions in a policy to grant permissions to perform the associated operation.

Policy actions in Amazon MSK use the following preﬁx before the action: kafka:. For example, to

grant someone permission to describe an MSK cluster with the Amazon MSK DescribeCluster

API operation, you include the kafka:DescribeCluster action in their policy. Policy statements

must include either an Action or NotAction element. Amazon MSK deﬁnes its own set of actions

that describe tasks that you can perform with this service.

To specify multiple actions in a single statement, separate them with commas as follows:

"Action": ["kafka:action1", "kafka:action2"]

You can specify multiple actions using wildcards (*). For example, to specify all actions that begin

with the word Describe, include the following action:

"Action": "kafka:Describe*"

To see a list of Amazon MSK actions, see Actions, resources, and condition keys for Amazon

Managed Streaming for Apache Kafka in the IAM User Guide.

Resources for Amazon MSK identity-based policies

Administrators can use AWS JSON policies to specify who has access to what. That is, which

principal can perform actions on what resources, and under what conditions.

The Resource JSON policy element speciﬁes the object or objects to which the action applies.

Statements must include either a Resource or a NotResource element. As a best practice,

How Amazon MSK works with IAM 239

Amazon Managed Streaming for Apache Kafka Developer Guide

specify a resource using its Amazon Resource Name (ARN). You can do this for actions that support

a speciﬁc resource type, known as resource-level permissions.

For actions that don't support resource-level permissions, such as listing operations, use a wildcard

(*) to indicate that the statement applies to all resources.

"Resource": "*"

The Amazon MSK instance resource has the following ARN:

arn:${Partition}:kafka:${Region}:${Account}:cluster/${ClusterName}/${UUID}

For more information about the format of ARNs, see Amazon Resource Names (ARNs) and AWS

Service Namespaces.

For example, to specify the CustomerMessages instance in your statement, use the following

ARN:

"Resource": "arn:aws:kafka:us-east-1:123456789012:cluster/CustomerMessages/abcd1234-

abcd-dcba-4321-a1b2abcd9f9f-2"

To specify all instances that belong to a speciﬁc account, use the wildcard (*):

"Resource": "arn:aws:kafka:us-east-1:123456789012:cluster/*"

Some Amazon MSK actions, such as those for creating resources, cannot be performed on a speciﬁc

resource. In those cases, you must use the wildcard (*).

"Resource": "*"

To specify multiple resources in a single statement, separate the ARNs with commas.

"Resource": ["resource1", "resource2"]

To see a list of Amazon MSK resource types and their ARNs, see Resources Deﬁned by Amazon

Managed Streaming for Apache Kafka in the IAM User Guide. To learn with which actions you can

specify the ARN of each resource, see Actions Deﬁned by Amazon Managed Streaming for Apache

Kafka.

How Amazon MSK works with IAM 240

Amazon Managed Streaming for Apache Kafka Developer Guide

Condition keys for Amazon MSK identity-based policies

Administrators can use AWS JSON policies to specify who has access to what. That is, which

principal can perform actions on what resources, and under what conditions.

The Condition element (or Condition block) lets you specify conditions in which a statement

is in eﬀect. The Condition element is optional. You can create conditional expressions that use

condition operators, such as equals or less than, to match the condition in the policy with values in

the request.

If you specify multiple Condition elements in a statement, or multiple keys in a single

Condition element, AWS evaluates them using a logical AND operation. If you specify multiple

values for a single condition key, AWS evaluates the condition using a logical OR operation. All of

the conditions must be met before the statement's permissions are granted.

You can also use placeholder variables when you specify conditions. For example, you can grant

an IAM user permission to access a resource only if it is tagged with their IAM user name. For more

information, see IAM policy elements: variables and tags in the IAM User Guide.

AWS supports global condition keys and service-speciﬁc condition keys. To see all AWS global

condition keys, see AWS global condition context keys in the IAM User Guide.

Amazon MSK deﬁnes its own set of condition keys and also supports using some global condition

keys. To see all AWS global condition keys, see AWS Global Condition Context Keys in the IAM User

Guide.

To see a list of Amazon MSK condition keys, see Condition Keys for Amazon Managed Streaming

for Apache Kafka in the IAM User Guide. To learn with which actions and resources you can use a

condition key, see Actions Deﬁned by Amazon Managed Streaming for Apache Kafka.

Examples for Amazon MSK identity-based policies

To view examples of Amazon MSK identity-based policies, see Amazon MSK identity-based policy

examples.

Amazon MSK resource-based policies

Amazon MSK supports a cluster policy (also known as a resource-based policy) for use with

Amazon MSK clusters. You can use a cluster policy to deﬁne which IAM principals have cross-

account permissions to set up private connectivity to your Amazon MSK cluster. When used with

How Amazon MSK works with IAM 241

Amazon Managed Streaming for Apache Kafka Developer Guide

IAM client authentication, you can also use the cluster policy to granularly deﬁne Kafka data plane

permissions for the connecting clients.

To view an example of how to conﬁgure a cluster policy, refer to Step 2: Attach a cluster policy to

the MSK cluster.

Authorization based on Amazon MSK tags

You can attach tags to Amazon MSK clusters. To control access based on tags, you provide tag

information in the condition element of a policy using the kafka:ResourceTag/key-name,

aws:RequestTag/key-name, or aws:TagKeys condition keys. For more information about

tagging Amazon MSK resources, see the section called “Tag a Amazon MSK cluster”.

To view an example identity-based policy for limiting access to a cluster based on the tags on that

cluster, see Accessing Amazon MSK clusters based on tags.

Amazon MSK IAM roles

An IAM role is an entity within your Amazon Web Services account that has speciﬁc permissions.

Using temporary credentials with Amazon MSK

You can use temporary credentials to sign in with federation, assume an IAM role, or to assume a

cross-account role. You obtain temporary security credentials by calling AWS STS API operations

such as AssumeRole or GetFederationToken.

Amazon MSK supports using temporary credentials.

Service-linked roles

Service-linked roles allow Amazon Web Services to access resources in other services to complete

an action on your behalf. Service-linked roles appear in your IAM account and are owned by the

service. An administrator can view but not edit the permissions for service-linked roles.

Amazon MSK supports service-linked roles. For details about creating or managing Amazon MSK

service-linked roles, the section called “Service-linked roles”.

Amazon MSK identity-based policy examples

By default, IAM users and roles don't have permission to execute Amazon MSK API actions. An

administrator must create IAM policies that grant users and roles permission to perform speciﬁc

Identity-based policy examples 242

Amazon Managed Streaming for Apache Kafka Developer Guide

API operations on the speciﬁed resources they need. The administrator must then attach those

policies to the IAM users or groups that require those permissions.

To learn how to create an IAM identity-based policy using these example JSON policy documents,

see Creating Policies on the JSON Tab in the IAM User Guide.

Topics

• Policy best practices

• Allow users to view their own permissions

• Accessing one Amazon MSK cluster

• Accessing Amazon MSK clusters based on tags

Policy best practices

Identity-based policies determine whether someone can create, access, or delete Amazon MSK

resources in your account. These actions can incur costs for your AWS account. When you create or

edit identity-based policies, follow these guidelines and recommendations:

• Get started with AWS managed policies and move toward least-privilege permissions – To

get started granting permissions to your users and workloads, use the AWS managed policies

that grant permissions for many common use cases. They are available in your AWS account. We

recommend that you reduce permissions further by deﬁning AWS customer managed policies

that are speciﬁc to your use cases. For more information, see AWS managed policies or AWS

managed policies for job functions in the IAM User Guide.

• Apply least-privilege permissions – When you set permissions with IAM policies, grant only the

permissions required to perform a task. You do this by deﬁning the actions that can be taken on

speciﬁc resources under speciﬁc conditions, also known as least-privilege permissions. For more

information about using IAM to apply permissions, see Policies and permissions in IAM in the

IAM User Guide.

• Use conditions in IAM policies to further restrict access – You can add a condition to your

policies to limit access to actions and resources. For example, you can write a policy condition to

specify that all requests must be sent using SSL. You can also use conditions to grant access to

service actions if they are used through a speciﬁc AWS service, such as AWS CloudFormation. For

more information, see IAM JSON policy elements: Condition in the IAM User Guide.

• Use IAM Access Analyzer to validate your IAM policies to ensure secure and functional

permissions – IAM Access Analyzer validates new and existing policies so that the policies

Identity-based policy examples 243

Amazon Managed Streaming for Apache Kafka Developer Guide

adhere to the IAM policy language (JSON) and IAM best practices. IAM Access Analyzer provides

more than 100 policy checks and actionable recommendations to help you author secure and

functional policies. For more information, see IAM Access Analyzer policy validation in the IAM

User Guide.

• Require multi-factor authentication (MFA) – If you have a scenario that requires IAM users

or a root user in your AWS account, turn on MFA for additional security. To require MFA when

API operations are called, add MFA conditions to your policies. For more information, see

Conﬁguring MFA-protected API access in the IAM User Guide.

For more information about best practices in IAM, see Security best practices in IAM in the IAM User

Guide.

Allow users to view their own permissions

This example shows how you might create a policy that allows IAM users to view the inline and

managed policies that are attached to their user identity. This policy includes permissions to

complete this action on the console or programmatically using the AWS CLI or AWS API.

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "ViewOwnUserInfo",

"Effect": "Allow",

"Action": [

"iam:GetUserPolicy",

"iam:ListGroupsForUser",

"iam:ListAttachedUserPolicies",

"iam:ListUserPolicies",

"iam:GetUser"

"Resource": ["arn:aws:iam::*:user/${aws:username}"]

{

"Sid": "NavigateInConsole",

"Effect": "Allow",

"Action": [

"iam:GetGroupPolicy",

"iam:GetPolicyVersion",

"iam:GetPolicy",

"iam:ListAttachedGroupPolicies",

Identity-based policy examples 244

Amazon Managed Streaming for Apache Kafka Developer Guide

"iam:ListGroupPolicies",

"iam:ListPolicyVersions",

"iam:ListPolicies",

"iam:ListUsers"

"Resource": "*"

}

]

}

Accessing one Amazon MSK cluster

In this example, you want to grant an IAM user in your Amazon Web Services account access to one

of your clusters, purchaseQueriesCluster. This policy allows the user to describe the cluster,

get its bootstrap brokers, list its broker nodes, and update it.

{

"Version":"2012-10-17",

"Statement":[

{

"Sid":"UpdateCluster",

"Effect":"Allow",

"Action":[

"kafka:Describe*",

"kafka:Get*",

"kafka:List*",

"kafka:Update*"

"Resource":"arn:aws:kafka:us-east-1:012345678012:cluster/

purchaseQueriesCluster/abcdefab-1234-abcd-5678-cdef0123ab01-2"

}

]

}

Accessing Amazon MSK clusters based on tags

You can use conditions in your identity-based policy to control access to Amazon MSK resources

based on tags. This example shows how you might create a policy that allows the user to describe

the cluster, get its bootstrap brokers, list its broker nodes, update it, and delete it. However,

permission is granted only if the cluster tag Owner has the value of that user's user name.

{

Identity-based policy examples 245

Amazon Managed Streaming for Apache Kafka Developer Guide

"Version": "2012-10-17",

"Statement": [

{

"Sid": "AccessClusterIfOwner",

"Effect": "Allow",

"Action": [

"kafka:Describe*",

"kafka:Get*",

"kafka:List*",

"kafka:Update*",

"kafka:Delete*"

"Resource": "arn:aws:kafka:us-east-1:012345678012:cluster/*",

"Condition": {

"StringEquals": {

"aws:ResourceTag/Owner": "${aws:username}"

}

]

}

You can attach this policy to the IAM users in your account. If a user named richard-roe

attempts to update an MSK cluster, the cluster must be tagged Owner=richard-roe or

owner=richard-roe. Otherwise, he is denied access. The condition tag key Owner matches both

Owner and owner because condition key names are not case-sensitive. For more information, see

IAM JSON Policy Elements: Condition in the IAM User Guide.

Service-linked roles for Amazon MSK

Amazon MSK uses AWS Identity and Access Management (IAM) service-linked roles. A service-

linked role is a unique type of IAM role that is linked directly to Amazon MSK. Service-linked roles

are predeﬁned by Amazon MSK and include all the permissions that the service requires to call

other AWS services on your behalf.

A service-linked role makes setting up Amazon MSK easier because you do not have to manually

add the necessary permissions. Amazon MSK deﬁnes the permissions of its service-linked roles.

Unless deﬁned otherwise, only Amazon MSK can assume its roles. The deﬁned permissions include

the trust policy and the permissions policy, and that permissions policy cannot be attached to any

other IAM entity.

Service-linked roles 246

Amazon Managed Streaming for Apache Kafka Developer Guide

For information about other services that support service-linked roles, see Amazon Web Services

That Work with IAM, and look for the services that have Yes in the Service-Linked Role column.

Choose a Yes with a link to view the service-linked role documentation for that service.

Topics

• Service-linked role permissions for Amazon MSK

• Create a service-linked role for Amazon MSK

• Edit a service-linked role for Amazon MSK

• Supported Regions for Amazon MSK service-linked roles

Service-linked role permissions for Amazon MSK

Amazon MSK uses the service-linked role named AWSServiceRoleForKafka. Amazon MSK uses this

role to access your resources and perform operations such as:

•

*NetworkInterface – create and manage network interfaces in the customer account that

make cluster brokers accessible to clients in the customer VPC.

•

*VpcEndpoints – manage VPC endpoints in the customer account that make cluster brokers

accessible to clients in the customer VPC using AWS PrivateLink. Amazon MSK uses permissions

to DescribeVpcEndpoints, ModifyVpcEndpoint and DeleteVpcEndpoints.

•

secretsmanager – manage client credentials with AWS Secrets Manager.

•

GetCertificateAuthorityCertificate – retrieve the certiﬁcate for your private certiﬁcate

authority.

This service-linked role is attached to the following managed policy: KafkaServiceRolePolicy.

For updates to this policy, see KafkaServiceRolePolicy.

The AWSServiceRoleForKafka service-linked role trusts the following services to assume the role:

•

kafka.amazonaws.com

The role permissions policy allows Amazon MSK to complete the following actions on resources.

{

"Version": "2012-10-17",

"Statement": [

{

Service-linked roles 247

Amazon Managed Streaming for Apache Kafka Developer Guide

"Effect": "Allow",

"Action": [

"ec2:CreateNetworkInterface",

"ec2:DescribeNetworkInterfaces",

"ec2:CreateNetworkInterfacePermission",

"ec2:AttachNetworkInterface",

"ec2:DeleteNetworkInterface",

"ec2:DetachNetworkInterface",

"ec2:DescribeVpcEndpoints",

"acm-pca:GetCertificateAuthorityCertificate",

"secretsmanager:ListSecrets"

"Resource": "*"

{

"Effect": "Allow",

"Action": [

"ec2:ModifyVpcEndpoint"

"Resource": "arn:*:ec2:*:*:subnet/*"

{

"Effect": "Allow",

"Action": [

"ec2:DeleteVpcEndpoints",

"ec2:ModifyVpcEndpoint"

"Resource": "arn:*:ec2:*:*:vpc-endpoint/*",

"Condition": {

"StringEquals": {

"ec2:ResourceTag/AWSMSKManaged": "true"

"StringLike": {

"ec2:ResourceTag/ClusterArn": "*"

}

{

"Effect": "Allow",

"Action": [

"secretsmanager:GetResourcePolicy",

"secretsmanager:PutResourcePolicy",

"secretsmanager:DeleteResourcePolicy",

"secretsmanager:DescribeSecret"

Service-linked roles 248

Amazon Managed Streaming for Apache Kafka Developer Guide

"Resource": "*",

"Condition": {

"ArnLike": {

"secretsmanager:SecretId": "arn:*:secretsmanager:*:*:secret:AmazonMSK_*"

}

]

}

You must conﬁgure permissions to allow an IAM entity (such as a user, group, or role) to create,

edit, or delete a service-linked role. For more information, see Service-Linked Role Permissions in

the IAM User Guide.

Create a service-linked role for Amazon MSK

You don't need to create a service-linked role manually. When you create an Amazon MSK cluster

in the AWS Management Console, the AWS CLI, or the AWS API, Amazon MSK creates the service-

linked role for you.

If you delete this service-linked role, and then need to create it again, you can use the same process

to recreate the role in your account. When you create an Amazon MSK cluster, Amazon MSK creates

the service-linked role for you again.

Edit a service-linked role for Amazon MSK

Amazon MSK does not allow you to edit the AWSServiceRoleForKafka service-linked role. After

you create a service-linked role, you cannot change the name of the role because various entities

might reference the role. However, you can edit the description of the role using IAM. For more

information, see Editing a Service-Linked Role in the IAM User Guide.

Supported Regions for Amazon MSK service-linked roles

Amazon MSK supports using service-linked roles in all of the Regions where the service is available.

For more information, see AWS Regions and Endpoints.

AWS managed policies for Amazon MSK

An AWS managed policy is a standalone policy that is created and administered by AWS. AWS

managed policies are designed to provide permissions for many common use cases so that you can

start assigning permissions to users, groups, and roles.

AWS managed policies 249

Amazon Managed Streaming for Apache Kafka Developer Guide

Keep in mind that AWS managed policies might not grant least-privilege permissions for your

speciﬁc use cases because they're available for all AWS customers to use. We recommend that you

reduce permissions further by deﬁning customer managed policies that are speciﬁc to your use

cases.

You cannot change the permissions deﬁned in AWS managed policies. If AWS updates the

permissions deﬁned in an AWS managed policy, the update aﬀects all principal identities (users,

groups, and roles) that the policy is attached to. AWS is most likely to update an AWS managed

policy when a new AWS service is launched or new API operations become available for existing

services.

For more information, see AWS managed policies in the IAM User Guide.

AWS managed policy: AmazonMSKFullAccess

This policy grants administrative permissions that allow a principal full access to all Amazon MSK

actions. The permissions in this policy are grouped as follows:

• The Amazon MSK permissions allow all Amazon MSK actions.

•

Amazon EC2 permissions – in this policy are required to validate the passed resources in an

API request. This is to make sure Amazon MSK is able to successfully use the resources with a

cluster. The rest of the Amazon EC2 permissions in this policy allow Amazon MSK to create AWS

resources that are needed to make it possible for you to connect to your clusters.

•

AWS KMS permissions – are used during API calls to validate the passed resources in a request.

They are required for Amazon MSK to be able to use the passed key with the Amazon MSK

cluster.

•

CloudWatch Logs, Amazon S3, and Amazon Data Firehose permissions – are required

for Amazon MSK to be able to ensure that the log delivery destinations are reachable, and that

they are valid for broker log use.

•

IAM permissions – are required for Amazon MSK to be able to a create service-linked role in your

account and to allow you to pass a service execution role to Amazon MSK.

{

"Version": "2012-10-17",

"Statement": [{

"Effect": "Allow",

"Action": [

"kafka:*",

AWS managed policies 250

Amazon Managed Streaming for Apache Kafka Developer Guide

"ec2:DescribeSubnets",

"ec2:DescribeVpcs",

"ec2:DescribeSecurityGroups",

"ec2:DescribeRouteTables",

"ec2:DescribeVpcEndpoints",

"ec2:DescribeVpcAttribute",

"kms:DescribeKey",

"kms:CreateGrant",

"logs:CreateLogDelivery",

"logs:GetLogDelivery",

"logs:UpdateLogDelivery",

"logs:DeleteLogDelivery",

"logs:ListLogDeliveries",

"logs:PutResourcePolicy",

"logs:DescribeResourcePolicies",

"logs:DescribeLogGroups",

"S3:GetBucketPolicy",

"firehose:TagDeliveryStream"

"Resource": "*"

{

"Effect": "Allow",

"Action": [

"ec2:CreateVpcEndpoint"

"Resource": [

"arn:*:ec2:*:*:vpc/*",

"arn:*:ec2:*:*:subnet/*",

"arn:*:ec2:*:*:security-group/*"

]

{

"Effect": "Allow",

"Action": [

"ec2:CreateVpcEndpoint"

"Resource": [

"arn:*:ec2:*:*:vpc-endpoint/*"

"Condition": {

"StringEquals": {

"aws:RequestTag/AWSMSKManaged": "true"

AWS managed policies 251

Amazon Managed Streaming for Apache Kafka Developer Guide

"StringLike": {

"aws:RequestTag/ClusterArn": "*"

}

{

"Effect": "Allow",

"Action": [

"ec2:CreateTags"

"Resource": "arn:*:ec2:*:*:vpc-endpoint/*",

"Condition": {

"StringEquals": {

"ec2:CreateAction": "CreateVpcEndpoint"

}

{

"Effect": "Allow",

"Action": [

"ec2:DeleteVpcEndpoints"

"Resource": "arn:*:ec2:*:*:vpc-endpoint/*",

"Condition": {

"StringEquals": {

"ec2:ResourceTag/AWSMSKManaged": "true"

"StringLike": {

"ec2:ResourceTag/ClusterArn": "*"

}

{

"Effect": "Allow",

"Action": "iam:PassRole",

"Resource": "*",

"Condition": {

"StringEquals": {

"iam:PassedToService": "kafka.amazonaws.com"

}

{

"Effect": "Allow",

AWS managed policies 252

Amazon Managed Streaming for Apache Kafka Developer Guide

"Action": "iam:CreateServiceLinkedRole",

"Resource": "arn:aws:iam::*:role/aws-service-role/kafka.amazonaws.com/

AWSServiceRoleForKafka*",

"Condition": {

"StringLike": {

"iam:AWSServiceName": "kafka.amazonaws.com"

}

{

"Effect": "Allow",

"Action": [

"iam:AttachRolePolicy",

"iam:PutRolePolicy"

"Resource": "arn:aws:iam::*:role/aws-service-role/kafka.amazonaws.com/

AWSServiceRoleForKafka*"

{

"Effect": "Allow",

"Action": "iam:CreateServiceLinkedRole",

"Resource": "arn:aws:iam::*:role/aws-service-role/delivery.logs.amazonaws.com/

AWSServiceRoleForLogDelivery*",

"Condition": {

"StringLike": {

"iam:AWSServiceName": "delivery.logs.amazonaws.com"

}

]

}

AWS managed policy: AmazonMSKReadOnlyAccess

This policy grants read-only permissions that allow users to view information in Amazon MSK.

Principals with this policy attached can't make any updates or delete exiting resources, nor can they

create new Amazon MSK resources. For example, principals with these permissions can view the list

of clusters and conﬁgurations associated with their account, but cannot change the conﬁguration

or settings of any clusters. The permissions in this policy are grouped as follows:

AWS managed policies 253

Amazon Managed Streaming for Apache Kafka Developer Guide

•

Amazon MSK permissions – allow you to list Amazon MSK resources, describe them, and get

information about them.

•

Amazon EC2 permissions – are used to describe the Amazon VPC, subnets, security groups, and

ENIs that are associated with a cluster.

•

AWS KMS permission – is used to describe the key that is associated with the cluster.

{

"Version": "2012-10-17",

"Statement": [

{

"Action": [

"kafka:Describe*",

"kafka:List*",

"kafka:Get*",

"ec2:DescribeNetworkInterfaces",

"ec2:DescribeSecurityGroups",

"ec2:DescribeSubnets",

"ec2:DescribeVpcs",

"kms:DescribeKey"

"Effect": "Allow",

"Resource": "*"

}

]

}

AWS managed policy: KafkaServiceRolePolicy

You can't attach KafkaServiceRolePolicy to your IAM entities. This policy is attached to a service-

linked role that allows Amazon MSK to perform actions such as managing VPC endpoints

(connectors) on MSK clusters, managing network interfaces, and managing cluster credentials with

AWS Secrets Manager. For more information, see the section called “Service-linked roles”.

AWS managed policy: AWSMSKReplicatorExecutionRole

The AWSMSKReplicatorExecutionRole policy grants permissions to the Amazon MSK replicator

to replicate data between MSK clusters. The permissions in this policy are grouped as follows:

•

cluster – Grants the Amazon MSK Replicator permissions to connect to the cluster using IAM

authentication. Also grants permissions to describe and alter the cluster.

AWS managed policies 254

Amazon Managed Streaming for Apache Kafka Developer Guide

•

topic – Grants the Amazon MSK Replicator permissions to describe, create, and alter a topic,

and to alter the topic's dynamic conﬁguration.

•

consumer group – Grants the Amazon MSK Replicator permissions to describe and alter

consumer groups, to read and write date from an MSK cluster, and to delete internal topics

created by the replicator.

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "ClusterPermissions",

"Effect": "Allow",

"Action": [

"kafka-cluster:Connect",

"kafka-cluster:DescribeCluster",

"kafka-cluster:AlterCluster",

"kafka-cluster:DescribeTopic",

"kafka-cluster:CreateTopic",

"kafka-cluster:AlterTopic",

"kafka-cluster:WriteData",

"kafka-cluster:ReadData",

"kafka-cluster:AlterGroup",

"kafka-cluster:DescribeGroup",

"kafka-cluster:DescribeTopicDynamicConfiguration",

"kafka-cluster:AlterTopicDynamicConfiguration",

"kafka-cluster:WriteDataIdempotently"

"Resource": [

"arn:aws:kafka:*:*:cluster/*"

]

{

"Sid": "TopicPermissions",

"Effect": "Allow",

"Action": [

"kafka-cluster:DescribeTopic",

"kafka-cluster:CreateTopic",

"kafka-cluster:AlterTopic",

"kafka-cluster:WriteData",

"kafka-cluster:ReadData",

"kafka-cluster:DescribeTopicDynamicConfiguration",

AWS managed policies 255

Amazon Managed Streaming for Apache Kafka Developer Guide

"kafka-cluster:AlterTopicDynamicConfiguration",

"kafka-cluster:AlterCluster"

"Resource": [

"arn:aws:kafka:*:*:topic/*/*"

]

{

"Sid": "GroupPermissions",

"Effect": "Allow",

"Action": [

"kafka-cluster:AlterGroup",

"kafka-cluster:DescribeGroup"

"Resource": [

"arn:aws:kafka:*:*:group/*/*"

]

}

]

}

Amazon MSK updates to AWS managed policies

View details about updates to AWS managed policies for Amazon MSK since this service began

tracking these changes.

Change Description Date

WriteDataIdempoten

tly permission added to

AWSMSKReplicatorEx

ecutionRole – Update to an

existing policy

Amazon MSK added

WriteDataIdempotently

permission to AWSMSKRep

licatorExecutionRole policy

to support data replication

between MSK clusters.

March 12, 2024

AWSMSKReplicatorEx

ecutionRole – New policy

Amazon MSK added

AWSMSKReplicatorEx

ecutionRole policy to support

Amazon MSK Replicator.

December 4, 2023

AWS managed policies 256

Amazon Managed Streaming for Apache Kafka Developer Guide

Change Description Date

AmazonMSKFullAccess –

Update to an existing policy

Amazon MSK added permissio

ns to support Amazon MSK

Replicator.

September 28, 2023

KafkaServiceRolePolicy –

Update to an existing policy

Amazon MSK added permissio

ns to support multi-VPC

private connectivity.

March 8, 2023

AmazonMSKFullAccess –

Update to an existing policy

Amazon MSK added new

Amazon EC2 permissions to

make it possible to connect to

a cluster.

November 30, 2021

AmazonMSKFullAccess –

Update to an existing policy

Amazon MSK added a new

permission to allow it to

describe Amazon EC2 route

tables.

November 19, 2021

Amazon MSK started tracking

changes

Amazon MSK started tracking

changes for its AWS managed

policies.

November 19, 2021

Troubleshoot Amazon MSK identity and access

Use the following information to help you diagnose and ﬁx common issues that you might

encounter when working with Amazon MSK and IAM.

Topics

• I Am not authorized to perform an action in Amazon MSK

I Am not authorized to perform an action in Amazon MSK

If the AWS Management Console tells you that you're not authorized to perform an action, then

you must contact your administrator for assistance. Your administrator is the person that provided

you with your sign-in credentials.

Troubleshoot Amazon MSK identity and access 257

Amazon Managed Streaming for Apache Kafka Developer Guide

The following example error occurs when the mateojackson IAM user tries to use the console to

delete a cluster but does not have kafka:DeleteCluster permissions.

User: arn:aws:iam::123456789012:user/mateojackson is not authorized to perform:

kafka:DeleteCluster on resource: purchaseQueriesCluster

In this case, Mateo asks his administrator to update his policies to allow him to access the

purchaseQueriesCluster resource using the kafka:DeleteCluster action.

Authentication and authorization for Apache Kafka APIs

You can use IAM to authenticate clients and to allow or deny Apache Kafka actions. Alternatively,

you can use TLS or SASL/SCRAM to authenticate clients, and Apache Kafka ACLs to allow or deny

actions.

For information on how to control who can perform Amazon MSK operations on your cluster, see

the section called “Authentication and authorization for Amazon MSK APIs”.

Topics

• IAM access control

• Mutual TLS client authentication for Amazon MSK

• Sign-in credentials authentication with AWS Secrets Manager

• Apache Kafka ACLs

IAM access control

IAM access control for Amazon MSK enables you to handle both authentication and authorization

for your MSK cluster. This eliminates the need to use one mechanism for authentication and

another for authorization. For example, when a client tries to write to your cluster, Amazon MSK

uses IAM to check whether that client is an authenticated identity and also whether it is authorized

to produce to your cluster. IAM access control works for Java and non-Java clients, including Kafka

clients written in Python, Go, JavaScript, and .NET.

Amazon MSK logs access events so you can audit them. For more information, see the section

called “CloudTrail events”.

Authentication and authorization for Apache Kafka APIs 258

Amazon Managed Streaming for Apache Kafka Developer Guide

To make IAM access control possible, Amazon MSK makes minor modiﬁcations to Apache Kafka

source code. These modiﬁcations won't cause a noticeable diﬀerence in your Apache Kafka

experience.

Important

IAM access control doesn't apply to Apache ZooKeeper nodes. For information about how

you can control access to those nodes, see the section called “Controlling access to Apache

ZooKeeper”.

Important

The allow.everyone.if.no.acl.found Apache Kafka setting has no eﬀect if your

cluster uses IAM access control.

Important

You can invoke Apache Kafka ACL APIs for an MSK cluster that uses IAM access control.

However, Apache Kafka ACLs have no eﬀect on authorization for IAM roles. You must use

IAM policies to control access for IAM roles.

How IAM access control for Amazon MSK works

To use IAM access control for Amazon MSK, perform the following steps, which are described in

detail in these topics:

• the section called “Create a Amazon MSK cluster that uses IAM access control”

• the section called “Conﬁgure clients for IAM access control”

• the section called “Create authorization policies for the IAM role”

• the section called “Get the bootstrap brokers for IAM access control”

IAM access control 259

Amazon Managed Streaming for Apache Kafka Developer Guide

Create a Amazon MSK cluster that uses IAM access control

This section explains how you can use the AWS Management Console, the API, or the AWS CLI

to create a Amazon MSK cluster that uses IAM access control. For information about how to turn

on IAM access control for an existing cluster, see the section called “Update Amazon MSK cluster

security”.

Use the AWS Management Console to create a cluster that uses IAM access control

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose Create cluster.

3. Choose Create cluster with custom settings.

4. In the Authentication section, choose IAM access control.

5. Complete the rest of the workﬂow for creating a cluster.

Use the API or the AWS CLI to create a cluster that uses IAM access control

• To create a cluster with IAM access control enabled, use the CreateCluster API or the create-

cluster CLI command, and pass the following JSON for the ClientAuthentication

parameter: "ClientAuthentication": { "Sasl": { "Iam": { "Enabled":

true } }.

Conﬁgure clients for IAM access control

To enable clients to communicate with an MSK cluster that uses IAM access control, you can use

either of these mechanisms:

• Non-Java client conﬁguration using SASL_OAUTHBEARER mechanism

• Java client conﬁguration using SASL_OAUTHBEARER mechanism or AWS_MSK_IAM mechanism

Use the SASL_OAUTHBEARER mechanism to conﬁgure IAM

1. Edit your client.properties conﬁguration ﬁle using the highlighted syntax in the example Python

Kafka client below as a guide. Conﬁguration changes are similar in other languages.

#!/usr/bin/python3from kafka import KafkaProducer

IAM access control 260

Amazon Managed Streaming for Apache Kafka Developer Guide

from kafka.errors import KafkaError

import socket

import time

from aws_msk_iam_sasl_signer import MSKAuthTokenProvider

class MSKTokenProvider():

def token(self):

token, _ = MSKAuthTokenProvider.generate_auth_token('<my aws region>')

return token

tp = MSKTokenProvider()

producer = KafkaProducer(

bootstrap_servers='<my bootstrap string>',

security_protocol='SASL_SSL',

sasl_mechanism='OAUTHBEARER',

sasl_oauth_token_provider=tp,

client_id=socket.gethostname(),

)

topic = "<my-topic>"

while True:

try:

inp=input(">")

producer.send(topic, inp.encode())

producer.flush()

print("Produced!")

except Exception:

print("Failed to send message:", e)

producer.close()

2. Download the helper library for your chosen conﬁguration language and follow the instructions

in the Getting started section on that language library’s homepage.

• JavaScript: https://github.com/aws/aws-msk-iam-sasl-signer-js#getting-started

• Python: https://github.com/aws/aws-msk-iam-sasl-signer-python#get-started

• Go: https://github.com/aws/aws-msk-iam-sasl-signer-go#getting-started

• .NET: https://github.com/aws/aws-msk-iam-sasl-signer-net#getting-started

•

JAVA: SASL_OAUTHBEARER support for Java is available through the aws-msk-iam-auth jar

ﬁle

IAM access control 261

Amazon Managed Streaming for Apache Kafka Developer Guide

Use the MSK custom AWS_MSK_IAM mechanism to conﬁgure IAM

Add the following to the client.properties ﬁle. Replace

<PATH_TO_TRUST_STORE_FILE> with the fully-qualiﬁed path to the trust store ﬁle on the

client.

Note

If you don't want to use a speciﬁc certiﬁcate, you can remove

ssl.truststore.location=<PATH_TO_TRUST_STORE_FILE> from

your client.properties ﬁle. When you don't specify a value for

ssl.truststore.location, the Java process uses the default certiﬁcate.

ssl.truststore.location=<PATH_TO_TRUST_STORE_FILE>

security.protocol=SASL_SSL

sasl.mechanism=AWS_MSK_IAM

sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;

sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler

To use a named proﬁle that you created for AWS credentials, include

awsProfileName="your profile name"; in your client conﬁguration ﬁle. For information

about named proﬁles, see Named proﬁles in the AWS CLI documentation.

2. Download the latest stable aws-msk-iam-auth JAR ﬁle, and place it in the class path. If you use

Maven, add the following dependency, adjusting the version number as needed:

<groupId>software.amazon.msk</groupId>

</dependency>

The Amazon MSK client plugin is open-sourced under the Apache 2.0 license.

Create authorization policies for the IAM role

Attach an authorization policy to the IAM role that corresponds to the client. In an authorization

policy, you specify which actions to allow or deny for the role. If your client is on an Amazon

IAM access control 262

Amazon Managed Streaming for Apache Kafka Developer Guide

EC2 instance, associate the authorization policy with the IAM role for that Amazon EC2 instance.

Alternatively, you can conﬁgure your client to use a named proﬁle, and then you associate the

authorization policy with the role for that named proﬁle. the section called “Conﬁgure clients for

IAM access control” describes how to conﬁgure a client to use a named proﬁle.

For information about how to create an IAM policy, see Creating IAM policies.

The following is an example authorization policy for a cluster named MyTestCluster. To understand

the semantics of the Action and Resource elements, see the section called “Semantics of IAM

authorization policy actions and resources”.

Important

Changes that you make to an IAM policy are reﬂected in the IAM APIs and the AWS CLI

immediately. However, it can take noticeable time for the policy change to take eﬀect.

In most cases, policy changes take eﬀect in less than a minute. Network conditions may

sometimes increase the delay.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"kafka-cluster:Connect",

"kafka-cluster:AlterCluster",

"kafka-cluster:DescribeCluster"

"Resource": [

"arn:aws:kafka:us-east-1:0123456789012:cluster/MyTestCluster/

abcd1234-0123-abcd-5678-1234abcd-1"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:*Topic*",

"kafka-cluster:WriteData",

"kafka-cluster:ReadData"

IAM access control 263

Amazon Managed Streaming for Apache Kafka Developer Guide

"Resource": [

"arn:aws:kafka:us-east-1:0123456789012:topic/MyTestCluster/*"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:AlterGroup",

"kafka-cluster:DescribeGroup"

"Resource": [

"arn:aws:kafka:us-east-1:0123456789012:group/MyTestCluster/*"

]

}

]

}

To learn how to create a policy with action elements that correspond to common Apache Kafka

use cases, like producing and consuming data, see the section called “Common use cases for client

authorization policy”.

For Kafka versions 2.8.0 and above, the WriteDataIdempotently permission is deprecated

(KIP-679). By default,enable.idempotence = true is set. Therefore, for Kafka versions

2.8.0 and above, IAM does not oﬀer the same functionality as Kafka ACLs. It is not possible

to WriteDataIdempotently to a topic by only providing WriteData access to that topic.

This does not aﬀect the case when WriteData is provided to ALL topics. In that case,

WriteDataIdempotently is allowed. This is due to diﬀerences in implementation of IAM logic

versus how the Kafka ACLs are implemented.

To work around this, we recommend using a policy similar to the sample below:

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"kafka-cluster:Connect",

"kafka-cluster:AlterCluster",

"kafka-cluster:DescribeCluster",

"kafka-cluster:WriteDataIdempotently"

IAM access control 264

Amazon Managed Streaming for Apache Kafka Developer Guide

"Resource": [

"arn:aws:kafka:us-east-1:0123456789012:cluster/MyTestCluster/

abcd1234-0123-abcd-5678-1234abcd-1"

]

{

"Effect": "Allow",

"Action": [

"kafka-cluster:*Topic*",

"kafka-cluster:WriteData",

"kafka-cluster:ReadData"

"Resource": [

"arn:aws:kafka:us-east-1:0123456789012:topic/MyTestCluster/

abcd1234-0123-abcd-5678-1234abcd-1/TestTopic"

]

}

]

}

In this case, WriteData allows writes to TestTopic, while WriteDataIdempotently allows

idempotent writes to the cluster. It is important to note that WriteDataIdempotently is a

cluster level permission. It cannot be used at the topic level. If WriteDataIdempotently is

restricted to the topic level, this policy will not work.

Get the bootstrap brokers for IAM access control

See the section called “Get the bootstrap brokers for an Amazon MSK cluster”.

Semantics of IAM authorization policy actions and resources

This section explains the semantics of the action and resource elements that you can use in an IAM

authorization policy. For an example policy, see the section called “Create authorization policies for

the IAM role”.

Authorization policy actions

The following table lists the actions that you can include in an authorization policy when you use

IAM access control for Amazon MSK. When you include in your authorization policy an action from

the Action column of the table, you must also include the corresponding actions from the Required

actions column.

IAM access control 265

Amazon Managed Streaming for Apache Kafka Developer Guide

Action Description Required

actions

Required

resources

Applicable

to serverless

clusters

kafka-clu

ster:Conn

ect

Grants permissio

n to connect and

authenticate to

the cluster.

None cluster Yes

kafka-clu

ster:Desc

ribeClust

Grants permissio

n to describe

various aspects

of the cluster,

equivalent to

Apache Kafka's

DESCRIBE

CLUSTER ACL.

kafka-clu

ster:Conn

ect

cluster Yes

kafka-clu

ster:Alte

rCluster

Grants permissio

n to alter various

aspects of

the cluster,

equivalent to

Apache Kafka's

ALTER CLUSTER

ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeClust

cluster No

kafka-clu

ster:Desc

ribeClust

erDynamic

Configura

tion

Grants permissio

n to describe

the dynamic

conﬁguration

of a cluster,

equivalent to

Apache Kafka's

DESCRIBE_

CONFIGS

CLUSTER ACL.

kafka-clu

ster:Conn

ect

cluster No

IAM access control 266

Amazon Managed Streaming for Apache Kafka Developer Guide

Action Description Required

actions

Required

resources

Applicable

to serverless

clusters

kafka-clu

ster:Alte

rClusterD

ynamicCon

figuration

Grants permissio

n to alter

the dynamic

conﬁguration

of a cluster,

equivalent to

Apache Kafka's

ALTER_CONFIGS

CLUSTER ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeClust

erDynamic

Configura

tion

cluster No

kafka-clu

ster:Writ

eDataIdem

potently

Grants permissio

n to write data

idempotently

on a cluster,

equivalent to

Apache Kafka's

IDEMPOTEN

T_WRITE

CLUSTER ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Writ

eData

cluster Yes

kafka-clu

ster:Crea

teTopic

Grants permissio

n to create

topics on

a cluster,

equivalen

t to Apache

Kafka's CREATE

CLUSTER/TOPIC

ACL.

kafka-clu

ster:Conn

ect

topic Yes

IAM access control 267

Amazon Managed Streaming for Apache Kafka Developer Guide

Action Description Required

actions

Required

resources

Applicable

to serverless

clusters

kafka-clu

ster:Desc

ribeTopic

Grants permissio

n to describe

topics on

a cluster,

equivalent to

Apache Kafka's

DESCRIBE TOPIC

ACL.

kafka-clu

ster:Conn

ect

topic Yes

kafka-clu

ster:Alte

rTopic

Grants permissio

n to alter topics

on a cluster,

equivalent to

Apache Kafka's

ALTER TOPIC

ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeTopic

topic Yes

kafka-clu

ster:Dele

teTopic

Grants permissio

n to delete

topics on

a cluster,

equivalent to

Apache Kafka's

DELETE TOPIC

ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeTopic

topic Yes

IAM access control 268

Amazon Managed Streaming for Apache Kafka Developer Guide

Action Description Required

actions

Required

resources

Applicable

to serverless

clusters

kafka-clu

ster:Desc

ribeTopic

DynamicCo

nfigurati

Grants permissio

n to describe

the dynamic

conﬁgura

tion of topics

on a cluster,

equivalent to

Apache Kafka's

DESCRIBE_

CONFIGS TOPIC

ACL.

kafka-clu

ster:Conn

ect

topic Yes

kafka-clu

ster:Alte

rTopicDyn

amicConfi

guration

Grants permissio

n to alter

the dynamic

conﬁgura

tion of topics

on a cluster,

equivalent to

Apache Kafka's

ALTER_CONFIGS

TOPIC ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeTopic

DynamicCo

nfigurati

topic Yes

IAM access control 269

Amazon Managed Streaming for Apache Kafka Developer Guide

Action Description Required

actions

Required

resources

Applicable

to serverless

clusters

kafka-clu

ster:Read

Data

Grants permissio

n to read data

from topics

on a cluster,

equivalent to

Apache Kafka's

READ TOPIC

ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeTopic

kafka-clu

ster:Alte

rGroup

topic Yes

kafka-clu

ster:Writ

eData

Grants permissio

n to write

data to topics

on a cluster,

equivalent to

Apache Kafka's

WRITE TOPIC

ACL

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeTopic

topic Yes

kafka-clu

ster:Desc

ribeGroup

Grants permissio

n to describe

groups on

a cluster,

equivalent to

Apache Kafka's

DESCRIBE

GROUP ACL.

kafka-clu

ster:Conn

ect

group Yes

IAM access control 270

Amazon Managed Streaming for Apache Kafka Developer Guide

Action Description Required

actions

Required

resources

Applicable

to serverless

clusters

kafka-clu

ster:Alte

rGroup

Grants permissio

n to join groups

on a cluster,

equivalent to

Apache Kafka's

READ GROUP

ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeGroup

group Yes

kafka-clu

ster:Dele

teGroup

Grants permissio

n to delete

groups on

a cluster,

equivalent to

Apache Kafka's

DELETE GROUP

ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeGroup

group Yes

kafka-clu

ster:Desc

ribeTrans

actionalId

Grants permissio

n to describe

transactional

IDs on a cluster,

equivalent to

Apache Kafka's

DESCRIBE

TRANSACTI

ONAL_ID ACL.

kafka-clu

ster:Conn

ect

transactional-id Yes

IAM access control 271

Amazon Managed Streaming for Apache Kafka Developer Guide

Action Description Required

actions

Required

resources

Applicable

to serverless

clusters

kafka-clu

ster:Alte

rTransact

ionalId

Grants permissio

n to alter

transactional

IDs on a cluster,

equivalen

t to Apache

Kafka's WRITE

TRANSACTI

ONAL_ID ACL.

kafka-clu

ster:Conn

ect

kafka-clu

ster:Desc

ribeTrans

actionalId

kafka-clu

ster:Writ

eData

transactional-id Yes

You can use the asterisk (*) wildcard any number of times in an action after the colon. The

following are examples.

•

kafka-cluster:*Topic stands for kafka-cluster:CreateTopic,

kafka-cluster:DescribeTopic, kafka-cluster:AlterTopic,

and kafka-cluster:DeleteTopic. It doesn't include kafka-

cluster:DescribeTopicDynamicConfiguration or kafka-

cluster:AlterTopicDynamicConfiguration.

•

kafka-cluster:* stands for all permissions.

Authorization policy resources

The following table shows the four types of resources that you can use in an authorization policy

when you use IAM access control for Amazon MSK. You can get the cluster Amazon Resource

Name (ARN) from the AWS Management Console or by using the DescribeCluster API or the

describe-cluster AWS CLI command. You can then use the cluster ARN to construct topic, group,

and transactional ID ARNs. To specify a resource in an authorization policy, use that resource's ARN.

IAM access control 272

Amazon Managed Streaming for Apache Kafka Developer Guide

Resource ARN format

Cluster

arn:aws:kafka:region:account-id :cluster/cluster-name /cluster-u

uid

Topic

arn:aws:kafka:region:account-id :topic/cluster-name /cluster-u

uid /topic-name

Group

arn:aws:kafka:region:account-id :group/cluster-name /cluster-u

uid /group-name

Transacti

onal ID

arn:aws:kafka:region:account-id :transactional-id/cluster-n

ame /cluster-uuid /transactional-id

You can use the asterisk (*) wildcard any number of times anywhere in the part of the ARN that

comes after :cluster/, :topic/, :group/, and :transactional-id/. The following are some

examples of how you can use the asterisk (*) wildcard to refer to multiple resources:

•

arn:aws:kafka:us-east-1:0123456789012:topic/MyTestCluster/*: all the topics in

any cluster named MyTestCluster, regardless of the cluster's UUID.

•

arn:aws:kafka:us-east-1:0123456789012:topic/MyTestCluster/abcd1234-0123-

abcd-5678-1234abcd-1/*_test: all topics whose name ends with "_test" in the cluster whose

name is MyTestCluster and whose UUID is abcd1234-0123-abcd-5678-1234abcd-1.

•

arn:aws:kafka:us-east-1:0123456789012:transactional-id/MyTestCluster/

*/5555abcd-1111-abcd-1234-abcd1234-1: all transactions whose transactional ID

is 5555abcd-1111-abcd-1234-abcd1234-1, across all incarnations of a cluster named

MyTestCluster in your account. This means that if you create a cluster named MyTestCluster, then

delete it, and then create another cluster by the same name, you can use this resource ARN to

represent the same transactions ID on both clusters. However, the deleted cluster isn't accessible.

Common use cases for client authorization policy

The ﬁrst column in the following table shows some common use cases. To authorize a client

to carry out a given use case, include the required actions for that use case in the client's

authorization policy, and set Effect to Allow.

IAM access control 273

Amazon Managed Streaming for Apache Kafka Developer Guide

For information about all the actions that are part of IAM access control for Amazon MSK, see the

section called “Semantics of IAM authorization policy actions and resources”.

Note

Actions are denied by default. You must explicitly allow every action that you want to

authorize the client to perform.

Use case Required actions

Admin

kafka-cluster:*

Create a topic

kafka-cluster:Connect

kafka-cluster:CreateTopic

Produce data

kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:WriteData

Consume data

kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:DescribeGroup

kafka-cluster:AlterGroup

kafka-cluster:ReadData

Produce data idempotently

kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:WriteData

kafka-cluster:WriteDataIdem

potently

IAM access control 274

Amazon Managed Streaming for Apache Kafka Developer Guide

Use case Required actions

Produce data transactionally

kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:WriteData

kafka-cluster:DescribeTrans

actionalId

kafka-cluster:AlterTransact

ionalId

Describe the conﬁguration of a cluster

kafka-cluster:Connect

kafka-cluster:DescribeClust

erDynamicConfiguration

Update the conﬁguration of a cluster

kafka-cluster:Connect

kafka-cluster:DescribeClust

erDynamicConfiguration

kafka-cluster:AlterClusterD

ynamicConfiguration

Describe the conﬁguration of a topic

kafka-cluster:Connect

kafka-cluster:DescribeTopic

DynamicConfiguration

Update the conﬁguration of a topic

kafka-cluster:Connect

kafka-cluster:DescribeTopic

DynamicConfiguration

kafka-cluster:AlterTopicDyn

amicConfiguration

IAM access control 275

Amazon Managed Streaming for Apache Kafka Developer Guide

Use case Required actions

Alter a topic

kafka-cluster:Connect

kafka-cluster:DescribeTopic

kafka-cluster:AlterTopic

Mutual TLS client authentication for Amazon MSK

You can enable client authentication with TLS for connections from your applications to your

Amazon MSK brokers. To use client authentication, you need an AWS Private CA. The AWS Private

CA can be either in the same AWS account as your cluster, or in a diﬀerent account. For information

about AWS Private CAs, see Creating and Managing a AWS Private CA.

Note

TLS authentication is not currently available in the Beijing and Ningxia Regions.

Amazon MSK doesn't support certiﬁcate revocation lists (CRLs). To control access to your cluster

topics or block compromised certiﬁcates, use Apache Kafka ACLs and AWS security groups. For

information about using Apache Kafka ACLs, see the section called “Apache Kafka ACLs”.

This topic contains the following sections:

• Create a Amazon MSK cluster that supports client authentication

• Set up a client to use authentication

• Produce and consume messages using authentication

Create a Amazon MSK cluster that supports client authentication

This procedure shows you how to enable client authentication using a AWS Private CA.

Mutual TLS authentication 276

Amazon Managed Streaming for Apache Kafka Developer Guide

Note

We highly recommend using independent AWS Private CA for each MSK cluster when you

use mutual TLS to control access. Doing so will ensure that TLS certiﬁcates signed by PCAs

only authenticate with a single MSK cluster.

Create a ﬁle named clientauthinfo.json with the following contents. Replace Private-

CA-ARN with the ARN of your PCA.

{

"Tls": {

"CertificateAuthorityArnList": ["Private-CA-ARN"]

}

Create a ﬁle named brokernodegroupinfo.json as described in the section called “Create

a provisioned Amazon MSK cluster using the AWS CLI”.

3. Client authentication requires that you also enable encryption in transit between clients

and brokers. Create a ﬁle named encryptioninfo.json with the following contents.

Replace KMS-Key-ARN with the ARN of your KMS key. You can set ClientBroker to TLS or

TLS_PLAINTEXT.

{

"EncryptionAtRest": {

"DataVolumeKMSKeyId": "KMS-Key-ARN"

"EncryptionInTransit": {

"InCluster": true,

"ClientBroker": "TLS"

}

For more information about encryption, see the section called “Amazon MSK encryption”.

4. On a machine where you have the AWS CLI installed, run the following command to create a

cluster with authentication and in-transit encryption enabled. Save the cluster ARN provided in

the response.

Mutual TLS authentication 277

Amazon Managed Streaming for Apache Kafka Developer Guide

aws kafka create-cluster --cluster-name "AuthenticationTest" --broker-node-group-

info file://brokernodegroupinfo.json --encryption-info file://encryptioninfo.json

--client-authentication file://clientauthinfo.json --kafka-version "{YOUR KAFKA

VERSION}" --number-of-broker-nodes 3

Set up a client to use authentication

This process describes how to set up an Amazon EC2 instance to use as a client to use

authentication.

This process describes how to produce and consume messages using authentication by creating a

client machine, creating a topic, and conﬁguring the required security settings.

1. Create an Amazon EC2 instance to use as a client machine. For simplicity, create this instance

in the same VPC you used for the cluster. See the section called “Create a client machine” for

an example of how to create such a client machine.

2. Create a topic. For an example, see the instructions under the section called “Create a topic in

the Amazon MSK cluster”.

3. On a machine where you have the AWS CLI installed, run the following command to get the

bootstrap brokers of the cluster. Replace Cluster-ARN with the ARN of your cluster.

aws kafka get-bootstrap-brokers --cluster-arn Cluster-ARN

Save the string associated with BootstrapBrokerStringTls in the response.

4. On your client machine, run the following command to use the JVM trust store to create your

client trust store. If your JVM path is diﬀerent, adjust the command accordingly.

cp /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.201.b09-0.amzn2.x86_64/jre/lib/security/

cacerts kafka.client.truststore.jks

5. On your client machine, run the following command to create a private key for your client.

Replace Distinguished-Name, Example-Alias, Your-Store-Pass, and Your-Key-Pass

with strings of your choice.

keytool -genkey -keystore kafka.client.keystore.jks -validity 300 -storepass Your-

Store-Pass -keypass Your-Key-Pass -dname "CN=Distinguished-Name" -alias Example-

Alias -storetype pkcs12

Mutual TLS authentication 278

Amazon Managed Streaming for Apache Kafka Developer Guide

6. On your client machine, run the following command to create a certiﬁcate request with the

private key you created in the previous step.

keytool -keystore kafka.client.keystore.jks -certreq -file client-cert-sign-request

-alias Example-Alias -storepass Your-Store-Pass -keypass Your-Key-Pass

Open the client-cert-sign-request ﬁle and ensure that it starts with -----BEGIN

CERTIFICATE REQUEST----- and ends with -----END CERTIFICATE REQUEST-----. If

it starts with -----BEGIN NEW CERTIFICATE REQUEST-----, delete the word NEW (and

the single space that follows it) from the beginning and the end of the ﬁle.

8. On a machine where you have the AWS CLI installed, run the following command to sign your

certiﬁcate request. Replace Private-CA-ARN with the ARN of your PCA. You can change the

validity value if you want. Here we use 300 as an example.

aws acm-pca issue-certificate --certificate-authority-arn Private-CA-ARN --csr

fileb://client-cert-sign-request --signing-algorithm "SHA256WITHRSA" --validity

Value=300,Type="DAYS"

Save the certiﬁcate ARN provided in the response.

Note

To retrieve your client certiﬁcate, use the acm-pca get-certificate command and

specify your certiﬁcate ARN. For more information, see get-certiﬁcate in the AWS CLI

Command Reference.

9. Run the following command to get the certiﬁcate that AWS Private CA signed for you. Replace

Certificate-ARN with the ARN you obtained from the response to the previous command.

aws acm-pca get-certificate --certificate-authority-arn Private-CA-ARN --

certificate-arn Certificate-ARN

10. From the JSON result of running the previous command, copy the strings associated with

Certificate and CertificateChain. Paste these two strings in a new ﬁle named signed-

certiﬁcate-from-acm. Paste the string associated with Certificate ﬁrst, followed by the

string associated with CertificateChain. Replace the \n characters with new lines. The

following is the structure of the ﬁle after you paste the certiﬁcate and certiﬁcate chain in it.

Mutual TLS authentication 279

Amazon Managed Streaming for Apache Kafka Developer Guide

-----BEGIN CERTIFICATE-----

...

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

...

-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----

...

-----END CERTIFICATE-----

11. Run the following command on the client machine to add this certiﬁcate to your keystore so

you can present it when you talk to the MSK brokers.

keytool -keystore kafka.client.keystore.jks -import -file signed-certificate-from-

acm -alias Example-Alias -storepass Your-Store-Pass -keypass Your-Key-Pass

12.

Create a ﬁle named client.properties with the following contents. Adjust the truststore

and keystore locations to the paths where you saved kafka.client.truststore.jks.

Substitute your Kafka client version for the {YOUR KAFKA VERSION} placeholders.

security.protocol=SSL

ssl.truststore.location=/tmp/kafka_2.12-{YOUR KAFKA VERSION}/

kafka.client.truststore.jks

ssl.keystore.location=/tmp/kafka_2.12-{YOUR KAFKA VERSION}/

kafka.client.keystore.jks

ssl.keystore.password=Your-Store-Pass

ssl.key.password=Your-Key-Pass

Produce and consume messages using authentication

This process describes how to produce and consume messages using authentication.

Run the following command to create a topic. The ﬁle named client.properties is the one

you created in the previous procedure.

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --bootstrap-

server BootstrapBroker-String --replication-factor 3 --partitions 1 --topic

ExampleTopic --command-config client.properties

Mutual TLS authentication 280

Amazon Managed Streaming for Apache Kafka Developer Guide

Run the following command to start a console producer. The ﬁle named client.properties

is the one you created in the previous procedure.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --bootstrap-

server BootstrapBroker-String --topic ExampleTopic --producer.config

client.properties

3. In a new command window on your client machine, run the following command to start a

console consumer.

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-

server BootstrapBroker-String --topic ExampleTopic --consumer.config

client.properties

4. Type messages in the producer window and watch them appear in the consumer window.

Sign-in credentials authentication with AWS Secrets Manager

You can control access to your Amazon MSK clusters using sign-in credentials that are stored

and secured using AWS Secrets Manager. Storing user credentials in Secrets Manager reduces the

overhead of cluster authentication such as auditing, updating, and rotating credentials. Secrets

Manager also lets you share user credentials across clusters.

This topic contains the following sections:

• How sign-in credentials authentication works

• Set up SASL/SCRAM authentication for an Amazon MSK cluster

• Working with users

• Limitations when using SCRAM secrets

How sign-in credentials authentication works

Sign-in credentials authentication for Amazon MSK uses SASL/SCRAM (Simple Authentication

and Security Layer/ Salted Challenge Response Mechanism) authentication. To set up sign-in

credentials authentication for a cluster, you create a Secret resource in AWS Secrets Manager, and

associate sign-in credentials with that secret.

SASL/SCRAM is deﬁned in RFC 5802. SCRAM uses secured hashing algorithms, and does not

transmit plaintext sign-in credentials between client and server.

SASL/SCRAM authentication 281

Amazon Managed Streaming for Apache Kafka Developer Guide

Note

When you set up SASL/SCRAM authentication for your cluster, Amazon MSK turns on TLS

encryption for all traﬃc between clients and brokers.

Set up SASL/SCRAM authentication for an Amazon MSK cluster

To set up a secret in AWS Secrets Manager, follow the Creating and Retrieving a Secret tutorial in

the AWS Secrets Manager User Guide.

Note the following requirements when creating a secret for an Amazon MSK cluster:

• Choose Other type of secrets (e.g. API key) for the secret type.

• Your secret name must begin with the preﬁx AmazonMSK_.

• You must either use an existing custom AWS KMS key or create a new custom AWS KMS key for

your secret. Secrets Manager uses the default AWS KMS key for a secret by default.

Important

A secret created with the default AWS KMS key cannot be used with an Amazon MSK

cluster.

• Your sign-in credential data must be in the following format to enter key-value pairs using the

Plaintext option.

{

"username": "alice",

"password": "alice-secret"

}

• Record the ARN (Amazon Resource Name) value for your secret.

•

Important

You can't associate a Secrets Manager secret with a cluster that exceeds the limits

described in the section called “ Right-size your cluster: Number of partitions per broker”.

•

If you use the AWS CLI to create the secret, specify a key ID or ARN for the kms-key-id

parameter. Don't specify an alias.

SASL/SCRAM authentication 282

Amazon Managed Streaming for Apache Kafka Developer Guide

• To associate the secret with your cluster, use either the Amazon MSK console, or the

BatchAssociateScramSecret operation.

Important

When you associate a secret with a cluster, Amazon MSK attaches a resource policy to the

secret that allows your cluster to access and read the secret values that you deﬁned. You

should not modify this resource policy. Doing so can prevent your cluster from accessing

your secret.

The following example JSON input for the BatchAssociateScramSecret operation associates

a secret with a cluster:

{

"clusterArn" : "arn:aws:kafka:us-west-2:0123456789019:cluster/SalesCluster/

abcd1234-abcd-cafe-abab-9876543210ab-4",

"secretArnList": [

"arn:aws:secretsmanager:us-west-2:0123456789019:secret:AmazonMSK_MyClusterSecret"

]

}

Connecting to your cluster with sign-in credentials

After you create a secret and associate it with your cluster, you can connect your client to the

cluster. The following example steps demonstrate how to connect a client to a cluster that uses

SASL/SCRAM authentication, and how to produce to and consume from an example topic.

1. Run the following command on a machine that has the AWS CLI installed, replacing

clusterARN with the ARN of your cluster.

aws kafka get-bootstrap-brokers --cluster-arn clusterARN

To create an example topic, run the following command, replacing BootstrapServerString

with one of the broker endpoints that you obtained in the previous step.

SASL/SCRAM authentication 283

Amazon Managed Streaming for Apache Kafka Developer Guide

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --bootstrap-

server BootstrapServerString --replication-factor 3 --partitions 1 --topic

ExampleTopicName

3. On your client machine, create a JAAS conﬁguration ﬁle that contains the user credentials

stored in your secret. For example, for the user alice, create a ﬁle called users_jaas.conf

with the following content.

KafkaClient {

org.apache.kafka.common.security.scram.ScramLoginModule required

username="alice"

password="alice-secret";

};

Use the following command to export your JAAS conﬁg ﬁle as a KAFKA_OPTS environment

parameter.

export KAFKA_OPTS=-Djava.security.auth.login.config=<path-to-jaas-file>/

users_jaas.conf

Create a ﬁle named kafka.client.truststore.jks in a ./tmp directory.

Use the following command to copy the JDK key store ﬁle from your JVM cacerts folder

into the kafka.client.truststore.jks ﬁle that you created in the previous step. Replace

JDKFolder with the name of the JDK folder on your instance. For example, your JDK folder

might be named java-1.8.0-openjdk-1.8.0.201.b09-0.amzn2.x86_64.

cp /usr/lib/jvm/JDKFolder/jre/lib/security/cacerts /tmp/kafka.client.truststore.jks

In the bin directory of your Apache Kafka installation, create a client properties ﬁle called

client_sasl.properties with the following contents. This ﬁle deﬁnes the SASL

mechanism and protocol.

security.protocol=SASL_SSL

sasl.mechanism=SCRAM-SHA-512

ssl.truststore.location=<path-to-keystore-file>/kafka.client.truststore.jks

Retrieve your bootstrap brokers string with the following command. Replace ClusterArn

with the Amazon Resource Name (ARN) of your cluster:

SASL/SCRAM authentication 284

Amazon Managed Streaming for Apache Kafka Developer Guide

aws kafka get-bootstrap-brokers --cluster-arn ClusterArn

From the JSON result of the command, save the value associated with the string named

BootstrapBrokerStringSaslScram.

9. To produce to the example topic that you created, run the following command on your client

machine. Replace BootstrapBrokerStringSaslScram with the value that you retrieved in

the previous step.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-

list BootstrapBrokerStringSaslScram --topic ExampleTopicName --producer.config

client_sasl.properties

10. To consume from the topic you created, run the following command on your client machine.

Replace BootstrapBrokerStringSaslScram with the value that you obtained previously.

<path-to-your-kafka-installation>/bin/kafka-console-consumer.sh --bootstrap-

server BootstrapBrokerStringSaslScram --topic ExampleTopicName --from-beginning --

consumer.config client_sasl.properties

Working with users

Creating users: You create users in your secret as key-value pairs. When you use the Plaintext

option in the Secrets Manager console, you should specify sign-in credential data in the following

format.

{

"username": "alice",

"password": "alice-secret"

}

Revoking user access: To revoke a user's credentials to access a cluster, we recommend that you

ﬁrst remove or enforce an ACL on the cluster, and then disassociate the secret. This is because of

the following:

• Removing a user does not close existing connections.

• Changes to your secret take up to 10 minutes to propagate.

SASL/SCRAM authentication 285

Amazon Managed Streaming for Apache Kafka Developer Guide

For information about using an ACL with Amazon MSK, see Apache Kafka ACLs.

For clusters using ZooKeeper mode, we recommend that you restrict access to your ZooKeeper

nodes to prevent users from modifying ACLs. For more information, see Control access to Apache

ZooKeeper nodes in your Amazon MSK cluster.

Limitations when using SCRAM secrets

Note the following limitations when using SCRAM secrets:

• Amazon MSK only supports SCRAM-SHA-512 authentication.

• An Amazon MSK cluster can have up to 1000 users.

• You must use an AWS KMS key with your Secret. You cannot use a Secret that uses the default

Secrets Manager encryption key with Amazon MSK. For information about creating a KMS key,

see Creating symmetric encryption KMS keys.

• You can't use an asymmetric KMS key with Secrets Manager.

• You can associate up to 10 secrets with a cluster at a time using the BatchAssociateScramSecret

operation.

• The name of secrets associated with an Amazon MSK cluster must have the preﬁx AmazonMSK_.

• Secrets associated with an Amazon MSK cluster must be in the same Amazon Web Services

account and AWS region as the cluster.

Apache Kafka ACLs

Apache Kafka has a pluggable authorizer and ships with an out-of-box authorizer implementation.

Amazon MSK enables this authorizer in the server.properties ﬁle on the brokers.

Apache Kafka ACLs have the format "Principal P is [Allowed/Denied] Operation O From Host H

on any Resource R matching ResourcePattern RP". If RP doesn't match a speciﬁc resource R, then

R has no associated ACLs, and therefore no one other than super users is allowed to access R. To

change this Apache Kafka behavior, you set the property allow.everyone.if.no.acl.found

to true. Amazon MSK sets it to true by default. This means that with Amazon MSK clusters, if you

don't explicitly set ACLs on a resource, all principals can access this resource. If you enable ACLs on

a resource, only the authorized principals can access it. If you want to restrict access to a topic and

authorize a client using TLS mutual authentication, add ACLs using the Apache Kafka authorizer

CLI. For more information about adding, removing, and listing ACLs, see Kafka Authorization

Command Line Interface.

Apache Kafka ACLs 286

Amazon Managed Streaming for Apache Kafka Developer Guide

In addition to the client, you also need to grant all your brokers access to your topics so that the

brokers can replicate messages from the primary partition. If the brokers don't have access to a

topic, replication for the topic fails.

To add or remove read and write access to a topic

1. Add your brokers to the ACL table to allow them to read from all topics that have ACLs in

place. To grant your brokers read access to a topic, run the following command on a client

machine that can communicate with the MSK cluster.

Replace Distinguished-Name with the DNS of any of your cluster's bootstrap

brokers, then replace the string before the ﬁrst period in this distinguished

name by an asterisk (*). For example, if one of your cluster's bootstrap

brokers has the DNS b-6.mytestcluster.67281x.c4.kafka.us-

east-1.amazonaws.com, replace Distinguished-Name in the following command with

*.mytestcluster.67281x.c4.kafka.us-east-1.amazonaws.com. For information

on how to get the bootstrap brokers, see the section called “Get the bootstrap brokers for an

Amazon MSK cluster”.

<path-to-your-kafka-installation>/bin/kafka-acls.sh --authorizer-properties

--bootstrap-server BootstrapServerString --add --allow-principal

"User:CN=Distinguished-Name" --operation Read --group=* --topic Topic-Name

2. To grant read access to a topic, run the following command on your client machine. If you use

mutual TLS authentication, use the same Distinguished-Name you used when you created

the private key.

<path-to-your-kafka-installation>/bin/kafka-acls.sh --authorizer-properties

--bootstrap-server BootstrapServerString --add --allow-principal

"User:CN=Distinguished-Name" --operation Read --group=* --topic Topic-Name

To remove read access, you can run the same command, replacing --add with --remove.

3. To grant write access to a topic, run the following command on your client machine. If you use

mutual TLS authentication, use the same Distinguished-Name you used when you created

the private key.

Apache Kafka ACLs 287

Amazon Managed Streaming for Apache Kafka Developer Guide

<path-to-your-kafka-installation>/bin/kafka-acls.sh --authorizer-properties

--bootstrap-server BootstrapServerString --add --allow-principal

"User:CN=Distinguished-Name" --operation Write --topic Topic-Name

To remove write access, you can run the same command, replacing --add with --remove.

Changing an Amazon MSK cluster's security group

This page explains how to change the security group of an existing MSK cluster. You might need

to change a cluster's security group in order to provide access to a certain set of users or to limit

access to the cluster. For information about security groups, see Security groups for your VPC in the

Amazon VPC user guide.

1. Use the ListNodes API or the list-nodes command in the AWS CLI to get a list of the brokers

in your cluster. The results of this operation include the IDs of the elastic network interfaces

(ENIs) that are associated with the brokers.

2. Sign in to the AWS Management Console and open the Amazon EC2 console at https://

console.aws.amazon.com/ec2/.

3. Using the dropdown list near the top-right corner of the screen, select the Region in which the

cluster is deployed.

4. In the left pane, under Network & Security, choose Network Interfaces.

5. Select the ﬁrst ENI that you obtained in the ﬁrst step. Choose the Actions menu at the top of

the screen, then choose Change Security Groups. Assign the new security group to this ENI.

Repeat this step for each of the ENIs that you obtained in the ﬁrst step.

Note

Changes that you make to a cluster's security group using the Amazon EC2 console

aren't reﬂected in the MSK console under Network settings.

6. Conﬁgure the new security group's rules to ensure that your clients have access to the brokers.

For information about setting security group rules, see Adding, Removing, and Updating Rules

in the Amazon VPC user guide.

Changing security groups 288

Amazon Managed Streaming for Apache Kafka Developer Guide

Important

If you change the security group that is associated with the brokers of a cluster, and then

add new brokers to that cluster, Amazon MSK associates the new brokers with the original

security group that was associated with the cluster when the cluster was created. However,

for a cluster to work correctly, all of its brokers must be associated with the same security

group. Therefore, if you add new brokers after changing the security group, you must

follow the previous procedure again and update the ENIs of the new brokers.

Control access to Apache ZooKeeper nodes in your Amazon

MSK cluster

For security reasons you can limit access to the Apache ZooKeeper nodes that are part of your

Amazon MSK cluster. To limit access to the nodes, you can assign a separate security group to

them. You can then decide who gets access to that security group.

Important

This section does not apply for clusters running in KRaft mode. See the section called

“KRaft mode ”.

This topic contains the following sections:

• To place your Apache ZooKeeper nodes in a separate security group

• Using TLS security with Apache ZooKeeper

To place your Apache ZooKeeper nodes in a separate security group

To limit access to Apache ZooKeeper nodes, you can assign a separate security group to them. You

can choose who has access to this new security group by setting security group rules.

1. Get the Apache ZooKeeper connection string for your cluster. To learn how, see the section

called “ZooKeeper mode”. The connection string contains the DNS names of your Apache

ZooKeeper nodes.

Controlling access to Apache ZooKeeper 289

Amazon Managed Streaming for Apache Kafka Developer Guide

Use a tool like host or ping to convert the DNS names you obtained in the previous step to IP

addresses. Save these IP addresses because you need them later in this procedure.

3. Sign in to the AWS Management Console and open the Amazon EC2 console at https://

console.aws.amazon.com/ec2/.

4. In the left pane, under NETWORK & SECURITY, choose Network Interfaces.

5. In the search ﬁeld above the table of network interfaces, type the name of your cluster, then

type return. This limits the number of network interfaces that appear in the table to those

interfaces that are associated with your cluster.

6. Select the check box at the beginning of the row that corresponds to the ﬁrst network

interface in the list.

7. In the details pane at the bottom of the page, look for the Primary private IPv4 IP. If this IP

address matches one of the IP addresses you obtained in the ﬁrst step of this procedure, this

means that this network interface is assigned to an Apache ZooKeeper node that is part of

your cluster. Otherwise, deselect the check box next to this network interface, and select the

next network interface in the list. The order in which you select the network interfaces doesn't

matter. In the next steps, you will perform the same operations on all network interfaces that

are assigned to Apache ZooKeeper nodes, one by one.

8. When you select a network interface that corresponds to an Apache ZooKeeper node, choose

the Actions menu at the top of the page, then choose Change Security Groups. Assign a new

security group to this network interface. For information about creating security groups, see

Creating a Security Group in the Amazon VPC documentation.

9. Repeat the previous step to assign the same new security group to all the network interfaces

that are associated with the Apache ZooKeeper nodes of your cluster.

10. You can now choose who has access to this new security group. For information about

setting security group rules, see Adding, Removing, and Updating Rules in the Amazon VPC

documentation.

Using TLS security with Apache ZooKeeper

You can use TLS security for encryption in transit between your clients and your Apache ZooKeeper

nodes. To implement TLS security with your Apache ZooKeeper nodes, do the following:

• Clusters must use Apache Kafka version 2.5.1 or later to use TLS security with Apache ZooKeeper.

• Enable TLS security when you create or conﬁgure your cluster. Clusters created with Apache

Kafka version 2.5.1 or later with TLS enabled automatically use TLS security with Apache

Using TLS security with Apache ZooKeeper 290

Amazon Managed Streaming for Apache Kafka Developer Guide

ZooKeeper endpoints. For information about setting up TLS security, see Get started with

Amazon MSK encryption.

• Retrieve the TLS Apache ZooKeeper endpoints using the DescribeCluster operation.

•

Create an Apache ZooKeeper conﬁguration ﬁle for use with the kafka-configs.sh and

kafka-acls.sh tools, or with the ZooKeeper shell. With each tool, you use the --zk-tls-

config-file parameter to specify your Apache ZooKeeper conﬁg.

The following example shows a typical Apache ZooKeeper conﬁguration ﬁle:

zookeeper.ssl.client.enable=true

zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty

zookeeper.ssl.keystore.location=kafka.jks

zookeeper.ssl.keystore.password=test1234

zookeeper.ssl.truststore.location=truststore.jks

zookeeper.ssl.truststore.password=test1234

•

For other commands (such as kafka-topics), you must use the KAFKA_OPTS environment

variable to conﬁgure Apache ZooKeeper parameters. The following example shows how to

conﬁgure the KAFKA_OPTS environment variable to pass Apache ZooKeeper parameters into

other commands:

export KAFKA_OPTS="

-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty

-Dzookeeper.client.secure=true

-Dzookeeper.ssl.trustStore.location=/home/ec2-user/kafka.client.truststore.jks

-Dzookeeper.ssl.trustStore.password=changeit"

After you conﬁgure the KAFKA_OPTS environment variable, you can use CLI commands normally.

The following example creates an Apache Kafka topic using the Apache ZooKeeper conﬁguration

from the KAFKA_OPTS environment variable:

<path-to-your-kafka-installation>/bin/kafka-topics.sh --create --

zookeeper ZooKeeperTLSConnectString --replication-factor 3 --partitions 1 --topic

AWSKafkaTutorialTopic

Using TLS security with Apache ZooKeeper 291

Amazon Managed Streaming for Apache Kafka Developer Guide

Note

The names of the parameters you use in your Apache ZooKeeper conﬁguration ﬁle and

those you use in your KAFKA_OPTS environment variable are not consistent. Pay attention

to which names you use with which parameters in your conﬁguration ﬁle and KAFKA_OPTS

environment variable.

For more information about accessing your Apache ZooKeeper nodes with TLS, see KIP-515:

Enable ZK client to use the new TLS supported authentication.

Amazon MSK logging

You can deliver Apache Kafka broker logs to one or more of the following destination types:

Amazon CloudWatch Logs, Amazon S3, Amazon Data Firehose. You can also log Amazon MSK API

calls with AWS CloudTrail.

Broker logs

Broker logs enable you to troubleshoot your Apache Kafka applications and to analyze their

communications with your MSK cluster. You can conﬁgure your new or existing MSK cluster to

deliver INFO-level broker logs to one or more of the following types of destination resources:

a CloudWatch log group, an S3 bucket, a Firehose delivery stream. Through Firehose you can

then deliver the log data from your delivery stream to OpenSearch Service. You must create a

destination resource before you conﬁgure your cluster to deliver broker logs to it. Amazon MSK

doesn't create these destination resources for you if they don't already exist. For information

about these three types of destination resources and how to create them, see the following

documentation:

• Amazon CloudWatch Logs

• Amazon S3

• Amazon Data Firehose

Amazon MSK logging 292

Amazon Managed Streaming for Apache Kafka Developer Guide

Required permissions

To conﬁgure a destination for Amazon MSK broker logs, the IAM identity that you use for

Amazon MSK actions must have the permissions described in the AWS managed policy:

AmazonMSKFullAccess policy.

To stream broker logs to an S3 bucket, you also need the s3:PutBucketPolicy permission. For

information about S3 bucket policies, see How Do I Add an S3 Bucket Policy? in the Amazon S3

User Guide. For information about IAM policies in general, see Access Management in the IAM User

Guide.

Required KMS key policy for use with SSE-KMS buckets

If you enabled server-side encryption for your S3 bucket using AWS KMS-managed keys (SSE-

KMS) with a customer managed key, add the following to the key policy for your KMS key so that

Amazon MSK can write broker ﬁles to the bucket.

{

"Sid": "Allow Amazon MSK to use the key.",

"Effect": "Allow",

"Principal": {

"Service": [

"delivery.logs.amazonaws.com"

]

"Action": [

"kms:Encrypt",

"kms:Decrypt",

"kms:ReEncrypt*",

"kms:GenerateDataKey*",

"kms:DescribeKey"

"Resource": "*"

}

Conﬁgure broker logs using the AWS Management Console

If you are creating a new cluster, look for the Broker log delivery heading in the Monitoring

section. You can specify the destinations to which you want Amazon MSK to deliver your broker

logs.

Broker logs 293

Amazon Managed Streaming for Apache Kafka Developer Guide

For an existing cluster, choose the cluster from your list of clusters, then choose the Properties

tab. Scroll down to the Log delivery section and then choose its Edit button. You can specify the

destinations to which you want Amazon MSK to deliver your broker logs.

Conﬁgure broker logs using the AWS CLI

When you use the create-cluster or the update-monitoring commands, you can optionally

specify the logging-info parameter and pass to it a JSON structure like the following example.

In this JSON, all three destination types are optional.

{

"BrokerLogs": {

"S3": {

"Bucket": "ExampleBucketName",

"Prefix": "ExamplePrefix",

"Enabled": true

"Firehose": {

"DeliveryStream": "ExampleDeliveryStreamName",

"Enabled": true

"CloudWatchLogs": {

"Enabled": true,

"LogGroup": "ExampleLogGroupName"

}

Conﬁgure broker logs using the API

You can specify the optional loggingInfo structure in the JSON that you pass to the

CreateCluster or UpdateMonitoring operations.

Note

By default, when broker logging is enabled, Amazon MSK logs INFO level logs to the

speciﬁed destinations. However, users of Apache Kafka 2.4.X and later can dynamically

set the broker log level to any of the log4j log levels. For information about dynamically

setting the broker log level, see KIP-412: Extend Admin API to support dynamic application

log levels. If you dynamically set the log level to DEBUG or TRACE, we recommend using

Amazon S3 or Firehose as the log destination. If you use CloudWatch Logs as a log

Broker logs 294

Amazon Managed Streaming for Apache Kafka Developer Guide

destination and you dynamically enable DEBUG or TRACE level logging, Amazon MSK may

continuously deliver a sample of logs. This can signiﬁcantly impact broker performance and

should only be used when the INFO log level is not verbose enough to determine the root

cause of an issue.

Log API calls with AWS CloudTrail

Note

AWS CloudTrail logs are available for Amazon MSK only when you use IAM access control.

Amazon MSK is integrated with AWS CloudTrail, a service that provides a record of actions taken

by a user, role, or an AWS service in Amazon MSK. CloudTrail captures API calls for as events. The

calls captured include calls from the Amazon MSK console and code calls to the Amazon MSK API

operations. It also captures Apache Kafka actions such as creating and altering topics and groups.

If you create a trail, you can enable continuous delivery of CloudTrail events to an Amazon S3

bucket, including events for Amazon MSK. If you don't conﬁgure a trail, you can still view the

most recent events in the CloudTrail console in Event history. Using the information collected by

CloudTrail, you can determine the request that was made to Amazon MSK or the Apache Kafka

action, the IP address from which the request was made, who made the request, when it was made,

and additional details.

To learn more about CloudTrail, including how to conﬁgure and enable it, see the AWS CloudTrail

User Guide.

Amazon MSK information in CloudTrail

CloudTrail is enabled on your Amazon Web Services account when you create the account. When

supported event activity occurs in an MSK cluster, that activity is recorded in a CloudTrail event

along with other AWS service events in Event history. You can view, search, and download recent

events in your Amazon Web Services account. For more information, see Viewing Events with

CloudTrail Event History.

For an ongoing record of events in your Amazon Web Services account, including events for

Amazon MSK, create a trail. A trail enables CloudTrail to deliver log ﬁles to an Amazon S3 bucket.

CloudTrail events 295

Amazon Managed Streaming for Apache Kafka Developer Guide

By default, when you create a trail in the console, the trail applies to all Regions. The trail logs

events from all Regions in the AWS partition and delivers the log ﬁles to the Amazon S3 bucket

that you specify. Additionally, you can conﬁgure other Amazon services to further analyze and act

upon the event data collected in CloudTrail logs. For more information, see the following:

• Overview for Creating a Trail

• CloudTrail Supported Services and Integrations

• Conﬁguring Amazon SNS Notiﬁcations for CloudTrail

• Receiving CloudTrail Log Files from Multiple Regions and Receiving CloudTrail Log Files from

Multiple Accounts

Amazon MSK logs all Amazon MSK operations as events in CloudTrail log ﬁles. In addition, it logs

the following Apache Kafka actions.

• kafka-cluster:DescribeClusterDynamicConﬁguration

• kafka-cluster:AlterClusterDynamicConﬁguration

• kafka-cluster:CreateTopic

• kafka-cluster:DescribeTopicDynamicConﬁguration

• kafka-cluster:AlterTopic

• kafka-cluster:AlterTopicDynamicConﬁguration

• kafka-cluster:DeleteTopic

Every event or log entry contains information about who generated the request. The identity

information helps you determine the following:

• Whether the request was made with root user or AWS Identity and Access Management (IAM)

user credentials.

• Whether the request was made with temporary security credentials for a role or federated user.

• Whether the request was made by another AWS service.

For more information, see the CloudTrail userIdentity Element.

CloudTrail events 296

Amazon Managed Streaming for Apache Kafka Developer Guide

Example: Amazon MSK log ﬁle entries

A trail is a conﬁguration that enables delivery of events as log ﬁles to an Amazon S3 bucket that

you specify. CloudTrail log ﬁles contain one or more log entries. An event represents a single

request from any source and includes information about the requested action, the date and time of

the action, request parameters, and so on. CloudTrail log ﬁles aren't an ordered stack trace of the

public API calls and Apache Kafka actions, so they don't appear in any speciﬁc order.

The following example shows CloudTrail log entries that demonstrate the DescribeCluster and

DeleteCluster Amazon MSK actions.

{

"Records": [

{

"eventVersion": "1.05",

"userIdentity": {

"type": "IAMUser",

"principalId": "ABCDEF0123456789ABCDE",

"arn": "arn:aws:iam::012345678901:user/Joe",

"accountId": "012345678901",

"accessKeyId": "AIDACKCEVSQ6C2EXAMPLE",

"userName": "Joe"

"eventTime": "2018-12-12T02:29:24Z",

"eventSource": "kafka.amazonaws.com",

"eventName": "DescribeCluster",

"awsRegion": "us-east-1",

"sourceIPAddress": "192.0.2.0",

"userAgent": "aws-cli/1.14.67 Python/3.6.0 Windows/10 botocore/1.9.20",

"requestParameters": {

"clusterArn": "arn%3Aaws%3Akafka%3Aus-east-1%3A012345678901%3Acluster

%2Fexamplecluster%2F01234567-abcd-0123-abcd-abcd0123efa-2"

"responseElements": null,

"requestID": "bd83f636-fdb5-abcd-0123-157e2fbf2bde",

"eventID": "60052aba-0123-4511-bcde-3e18dbd42aa4",

"readOnly": true,

"eventType": "AwsApiCall",

"recipientAccountId": "012345678901"

{

"eventVersion": "1.05",

"userIdentity": {

CloudTrail events 297

Amazon Managed Streaming for Apache Kafka Developer Guide

"type": "IAMUser",

"principalId": "ABCDEF0123456789ABCDE",

"arn": "arn:aws:iam::012345678901:user/Joe",

"accountId": "012345678901",

"accessKeyId": "AIDACKCEVSQ6C2EXAMPLE",

"userName": "Joe"

"eventTime": "2018-12-12T02:29:40Z",

"eventSource": "kafka.amazonaws.com",

"eventName": "DeleteCluster",

"awsRegion": "us-east-1",

"sourceIPAddress": "192.0.2.0",

"userAgent": "aws-cli/1.14.67 Python/3.6.0 Windows/10 botocore/1.9.20",

"requestParameters": {

"clusterArn": "arn%3Aaws%3Akafka%3Aus-east-1%3A012345678901%3Acluster

%2Fexamplecluster%2F01234567-abcd-0123-abcd-abcd0123efa-2"

"responseElements": {

"clusterArn": "arn:aws:kafka:us-east-1:012345678901:cluster/

examplecluster/01234567-abcd-0123-abcd-abcd0123efa-2",

"state": "DELETING"

"requestID": "c6bfb3f7-abcd-0123-afa5-293519897703",

"eventID": "8a7f1fcf-0123-abcd-9bdb-1ebf0663a75c",

"readOnly": false,

"eventType": "AwsApiCall",

"recipientAccountId": "012345678901"

}

]

}

The following example shows a CloudTrail log entry that demonstrates the kafka-

cluster:CreateTopic action.

{

"eventVersion": "1.08",

"userIdentity": {

"type": "IAMUser",

"principalId": "ABCDEFGH1IJKLMN2P34Q5",

"arn": "arn:aws:iam::111122223333:user/Admin",

"accountId": "111122223333",

"accessKeyId": "CDEFAB1C2UUUUU3AB4TT",

"userName": "Admin"

CloudTrail events 298

Amazon Managed Streaming for Apache Kafka Developer Guide

"eventTime": "2021-03-01T12:51:19Z",

"eventSource": "kafka-cluster.amazonaws.com",

"eventName": "CreateTopic",

"awsRegion": "us-east-1",

"sourceIPAddress": "198.51.100.0/24",

"userAgent": "aws-msk-iam-auth/unknown-version/aws-internal/3 aws-sdk-java/1.11.970

Linux/4.14.214-160.339.amzn2.x86_64 OpenJDK_64-Bit_Server_VM/25.272-b10 java/1.8.0_272

scala/2.12.8 vendor/Red_Hat,_Inc.",

"requestParameters": {

"kafkaAPI": "CreateTopics",

"resourceARN": "arn:aws:kafka:us-east-1:111122223333:topic/IamAuthCluster/3ebafd8e-

dae9-440d-85db-4ef52679674d-1/Topic9"

"responseElements": null,

"requestID": "e7c5e49f-6aac-4c9a-a1d1-c2c46599f5e4",

"eventID": "be1f93fd-4f14-4634-ab02-b5a79cb833d2",

"readOnly": false,

"eventType": "AwsApiCall",

"managementEvent": true,

"eventCategory": "Management",

"recipientAccountId": "111122223333"

}

Compliance validation for Amazon Managed Streaming for

Apache Kafka

Third-party auditors assess the security and compliance of Amazon Managed Streaming for Apache

Kafka as part of AWS compliance programs. These include PCI and HIPAA BAA.

For a list of AWS services in scope of speciﬁc compliance programs, see Amazon Services in Scope

by Compliance Program. For general information, see AWS Compliance Programs.

You can download third-party audit reports using AWS Artifact. For more information, see

Downloading Reports in AWS Artifact.

Your compliance responsibility when using Amazon MSK is determined by the sensitivity of your

data, your company's compliance objectives, and applicable laws and regulations. AWS provides the

following resources to help with compliance:

Compliance validation 299

Amazon Managed Streaming for Apache Kafka Developer Guide

• Security and Compliance Quick Start Guides – These deployment guides discuss architectural

considerations and provide steps for deploying security- and compliance-focused baseline

environments on AWS.

• Architecting for HIPAA Security and Compliance Whitepaper – This whitepaper describes how

companies can use AWS to create HIPAA-compliant applications.

• AWS Compliance Resources – This collection of workbooks and guides might apply to your

industry and location.

• Evaluating Resources with Rules in the AWS Conﬁg Developer Guide – The AWS Conﬁg service

assesses how well your resource conﬁgurations comply with internal practices, industry

guidelines, and regulations.

• AWS Security Hub – This AWS service provides a comprehensive view of your security state within

AWS that helps you check your compliance with security industry standards and best practices.

Resilience in Amazon Managed Streaming for Apache Kafka

The AWS global infrastructure is built around AWS Regions and Availability Zones. AWS Regions

provide multiple physically separated and isolated Availability Zones, which are connected with

low-latency, high-throughput, and highly redundant networking. With Availability Zones, you

can design and operate applications and databases that automatically fail over between zones

without interruption. Availability Zones are more highly available, fault tolerant, and scalable than

traditional single or multiple data center infrastructures.

For more information about AWS Regions and Availability Zones, see AWS Global Infrastructure.

Infrastructure security in Amazon Managed Streaming for

Apache Kafka

As a managed service, Amazon Managed Streaming for Apache Kafka is protected by the AWS

global network security procedures that are described in the Amazon Web Services: Overview of

Security Processes whitepaper.

You use AWS published API calls to access Amazon MSK through the network. Clients must support

Transport Layer Security (TLS) 1.0 or later. We recommend TLS 1.2 or later. Clients must also

support cipher suites with perfect forward secrecy (PFS) such as Ephemeral Diﬃe-Hellman (DHE)

or Elliptic Curve Ephemeral Diﬃe-Hellman (ECDHE). Most modern systems such as Java 7 and later

support these modes.

Resilience 300

Amazon Managed Streaming for Apache Kafka Developer Guide

Additionally, requests must be signed by using an access key ID and a secret access key that is

associated with an IAM principal. Or you can use the AWS Security Token Service (AWS STS) to

generate temporary security credentials to sign requests.

Infrastructure security 301

Amazon Managed Streaming for Apache Kafka Developer Guide

Connect to an Amazon MSK cluster

By default, clients can access an MSK cluster only if they're in the same VPC as the cluster. All

communication between your Kafka clients and your MSK cluster are private by default and your

streaming data never traverses the internet. To connect to your MSK cluster from a client that's

in the same VPC as the cluster, make sure the cluster's security group has an inbound rule that

accepts traﬃc from the client's security group. For information about setting up these rules, see

Security Group Rules. For an example of how to access a cluster from an Amazon EC2 instance

that's in the same VPC as the cluster, see Get started.

To connect to your MSK cluster from a client that's outside the cluster's VPC, see Access from within

AWS but outside cluster's VPC.

Topics

• Turn on public access to an MSK cluster

• Access from within AWS but outside cluster's VPC

Turn on public access to an MSK cluster

Amazon MSK gives you the option to turn on public access to the brokers of MSK clusters running

Apache Kafka 2.6.0 or later versions. For security reasons, you can't turn on public access while

creating an MSK cluster. However, you can update an existing cluster to make it publicly accessible.

You can also create a new cluster and then update it to make it publicly accessible.

You can turn on public access to an MSK cluster at no additional cost, but standard AWS data

transfer costs apply for data transfer in and out of the cluster. For information about pricing, see

Amazon EC2 On-Demand Pricing.

To turn on public access to a cluster, ﬁrst ensure that the cluster meets all of the following

conditions:

• The subnets that are associated with the cluster must be public. This means that the subnets

must have an associated route table with an internet gateway attached. For information about

how to create and attach an internet gateway, see Internet gateways in the Amazon VPC user

guide.

Turn on public access 302

Amazon Managed Streaming for Apache Kafka Developer Guide

• Unauthenticated access control must be oﬀ and at least one of the following access-control

methods must be on: SASL/IAM, SASL/SCRAM, mTLS. For information about how to update the

access-control method of a cluster, see the section called “Update Amazon MSK cluster security”.

• Encryption within the cluster must be turned on. The on setting is the default when creating a

cluster. It's not possible to turn on encryption within the cluster for a cluster that was created

with it turned oﬀ. It is therefore not possible to turn on public access for a cluster that was

created with encryption within the cluster turned oﬀ.

• Plaintext traﬃc between brokers and clients must be oﬀ. For information about how to turn it oﬀ

if it's on, see the section called “Update Amazon MSK cluster security”.

• If you are using the SASL/SCRAM or mTLS access-control methods, you must set Apache Kafka

ACLs for your cluster. After you set the Apache Kafka ACLs for your cluster, update the cluster's

conﬁguration to have the property allow.everyone.if.no.acl.found to false for the

cluster. For information about how to update the conﬁguration of a cluster, see the section called

“Amazon MSK conﬁguration operations”. If you are using IAM access control and want to apply

authorization policies or update your authorization policies, see the section called “IAM access

control”. For information about Apache Kafka ACLs, see the section called “Apache Kafka ACLs”.

After you ensure that an MSK cluster meets the conditions listed above, you can use the AWS

Management Console, the AWS CLI, or the Amazon MSK API to turn on public access. After you turn

on public access to a cluster, you can get a public bootstrap-brokers string for it. For information

about getting the bootstrap brokers for a cluster, see the section called “Get the bootstrap brokers

for an Amazon MSK cluster”.

Important

In addition to turning on public access, ensure that the cluster's security groups have

inbound TCP rules that allow public access from your IP address. We recommend that

you make these rules as restrictive as possible. For information about security groups and

inbound rules, see Security groups for your VPC in the Amazon VPC User Guide. For port

numbers, see the section called “Port information”. For instructions on how to change a

cluster's security group, see the section called “Changing security groups”.

Turn on public access 303

Amazon Managed Streaming for Apache Kafka Developer Guide

Note

If you use the following instructions to turn on public access and then still cannot access

the cluster, see the section called “Unable to access cluster that has public access turned

on”.

Turning on public access using the console

1. Sign in to the AWS Management Console, and open the Amazon MSK console at https://

console.aws.amazon.com/msk/home?region=us-east-1#/home/.

2. In the list of clusters, choose the cluster to which you want to turn on public access.

3. Choose the Properties tab, then ﬁnd the Network settings section.

4. Choose Edit public access.

Turning on public access using the AWS CLI

Run the following AWS CLI command, replacing ClusterArn and Current-Cluster-

Version with the ARN and current version of the cluster. To ﬁnd the current version of the

cluster, use the DescribeCluster operation or the describe-cluster AWS CLI command. An

example version is KTVPDKIKX0DER.

aws kafka update-connectivity --cluster-arn ClusterArn --current-

version Current-Cluster-Version --connectivity-info '{"PublicAccess": {"Type":

"SERVICE_PROVIDED_EIPS"}}'

The output of this update-connectivity command looks like the following JSON example.

{

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/

abcdefab-1234-abcd-5678-cdef0123ab01-2",

"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef"

}

Turn on public access 304

Amazon Managed Streaming for Apache Kafka Developer Guide

Note

To turn oﬀ public access, use a similar AWS CLI command, but with the following

connectivity info instead:

'{"PublicAccess": {"Type": "DISABLED"}}'

To get the result of the update-connectivity operation, run the following command,

replacing ClusterOperationArn with the ARN that you obtained in the output of the

update-connectivity command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON

example.

{

"ClusterOperationInfo": {

"ClientRequestId": "982168a3-939f-11e9-8a62-538df00285db",

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/

exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2",

"CreationTime": "2019-06-20T21:08:57.735Z",

"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef",

"OperationState": "UPDATE_COMPLETE",

"OperationType": "UPDATE_CONNECTIVITY",

"SourceClusterInfo": {

"ConnectivityInfo": {

"PublicAccess": {

"Type": "DISABLED"

}

"TargetClusterInfo": {

"ConnectivityInfo": {

"PublicAccess": {

"Type": "SERVICE_PROVIDED_EIPS"

}

Turn on public access 305

Amazon Managed Streaming for Apache Kafka Developer Guide

}

If OperationState has the value UPDATE_IN_PROGRESS, wait a while, then run the

describe-cluster-operation command again.

Turning on public access using the Amazon MSK API

• To use the API to turn public access to a cluster on or oﬀ, see UpdateConnectivity.

Note

For security reasons, Amazon MSK doesn't allow public access to Apache ZooKeeper or

KRaft controller nodes.

Access from within AWS but outside cluster's VPC

To connect to an MSK cluster from inside AWS but outside the cluster's Amazon VPC, the following

options exist.

Amazon VPC peering

To connect to your MSK cluster from a VPC that's diﬀerent from the cluster's VPC, you can create

a peering connection between the two VPCs. For information about VPC peering, see the Amazon

VPC Peering Guide.

AWS Direct Connect

AWS Direct Connect links your on-premise network to AWS over a standard 1 gigabit or 10 gigabit

Ethernet ﬁber-optic cable. One end of the cable is connected to your router, the other to an AWS

Direct Connect router. With this connection in place, you can create virtual interfaces directly to the

AWS cloud and Amazon VPC, bypassing Internet service providers in your network path. For more

information, see AWS Direct Connect.

Access from within AWS 306

Amazon Managed Streaming for Apache Kafka Developer Guide

AWS Transit Gateway

AWS Transit Gateway is a service that enables you to connect your VPCs and your on-premises

networks to a single gateway. For information about how to use AWS Transit Gateway, see AWS

Transit Gateway.

VPN connections

You can connect your MSK cluster's VPC to remote networks and users using the VPN connectivity

options described in the following topic: VPN Connections.

REST proxies

You can install a REST proxy on an instance running within your cluster's Amazon VPC. REST

proxies enable your awproducers and consumers to communicate with the cluster through HTTP

API requests.

Multiple Region multi-VPC connectivity

The following document describes connectivity options for multiple VPCs that reside in diﬀerent

Regions: Multiple Region Multi-VPC Connectivity.

Single Region multi-VPC private connectivity

Multi-VPC private connectivity (powered by AWS PrivateLink) for Amazon Managed Streaming for

Apache Kafka (Amazon MSK) clusters is a feature that enables you to more quickly connect Kafka

clients hosted in diﬀerent Virtual Private Clouds (VPCs) and AWS accounts to an Amazon MSK

cluster.

See Single Region multi-VPC connectivity for cross-account clients.

EC2-Classic networking is retired

Amazon MSK no longer supports Amazon EC2 instances running with Amazon EC2-Classic

networking.

See EC2-Classic Networking is Retiring – Here’s How to Prepare.

AWS Transit Gateway 307

Amazon Managed Streaming for Apache Kafka Developer Guide

Amazon MSK multi-VPC private connectivity in a single Region

Multi-VPC private connectivity (powered by AWS PrivateLink) for Amazon Managed Streaming for

Apache Kafka (Amazon MSK) clusters is a feature that enables you to more quickly connect Kafka

clients hosted in diﬀerent Virtual Private Clouds (VPCs) and AWS accounts to an Amazon MSK

cluster.

Multi-VPC private connectivity is a managed solution that simpliﬁes the networking infrastructure

for multi-VPC and cross-account connectivity. Clients can connect to the Amazon MSK cluster over

PrivateLink while keeping all traﬃc within the AWS network. Multi-VPC private connectivity for

Amazon MSK clusters is available in all AWS Regions where Amazon MSK is available.

Topics

• What is multi-VPC private connectivity?

• Beneﬁts of multi-VPC private connectivity

• Requirements and limitations for multi-VPC private connectivity

• Get started using multi-VPC private connectivity

• Update the authorization schemes on a cluster

• Reject a managed VPC connection to an Amazon MSK cluster

• Delete a managed VPC connection to an Amazon MSK cluster

• Permissions for multi-VPC private connectivity

What is multi-VPC private connectivity?

Multi-VPC private connectivity for Amazon MSK is a connectivity option that enables you to

connect Apache Kafka clients that are hosted in diﬀerent Virtual Private Clouds (VPCs) and AWS

accounts to a MSK cluster.

Amazon MSK simpliﬁes cross-account access with cluster policies. These policies allow the cluster

owner to grant permissions for other AWS accounts to establish private connectivity to the MSK

cluster.

Beneﬁts of multi-VPC private connectivity

Multi-VPC private connectivity has several advantages over other connectivity solutions:

• It automates operational management of the AWS PrivateLink connectivity solution.

Multi-VPC private connectivity in a single Region 308

Amazon Managed Streaming for Apache Kafka Developer Guide

• It allows overlapping IPs across connecting VPCs, eliminating the need to maintain non-

overlapping IPs, complex peering, and routing tables associated with other VPC connectivity

solutions.

You use a cluster policy for your MSK cluster to deﬁne which AWS accounts have permissions to set

up cross-account private connectivity to your MSK cluster. The cross-account admin can delegate

permissions to appropriate roles or users. When used with IAM client authentication, you can also

use the cluster policy to deﬁne Kafka data plane permissions on a granular basis for the connecting

clients.

Requirements and limitations for multi-VPC private connectivity

Note these MSK cluster requirements for running multi-VPC private connectivity:

• Multi-VPC private connectivity is supported only on Apache Kafka 2.7.1 or higher. Make sure

that any clients that you use with the MSK cluster are running Apache Kafka versions that are

compatible with the cluster.

• Multi-VPC private connectivity supports auth types IAM, TLS and SASL/SCRAM. Unauthenticated

clusters can't use multi-VPC private connectivity.

• If you are using the SASL/SCRAM or mTLS access-control methods, you must set Apache Kafka

ACLs for your cluster. First, set the Apache Kafka ACLs for your cluster. Then, update the cluster's

conﬁguration to have the property allow.everyone.if.no.acl.found set to false for the

cluster. For information about how to update the conﬁguration of a cluster, see the section called

“Amazon MSK conﬁguration operations”. If you are using IAM access control and want to apply

authorization policies or update your authorization policies, see the section called “IAM access

control”. For information about Apache Kafka ACLs, see the section called “Apache Kafka ACLs”.

• Multi-VPC private connectivity doesn’t support the t3.small instance type.

• Multi-VPC private connectivity isn’t supported across AWS Regions, only on AWS accounts within

the same Region.

• Amazon MSK doesn't support multi-VPC private connectivity to Zookeeper nodes.

Get started using multi-VPC private connectivity

Topics

• Step 1: On the MSK cluster in Account A, turn on multi-VPC connectivity for IAM auth scheme on

the cluster

Multi-VPC private connectivity in a single Region 309

Amazon Managed Streaming for Apache Kafka Developer Guide

• Step 2: Attach a cluster policy to the MSK cluster

• Step 3: Cross-account user actions to conﬁgure client-managed VPC connections

This tutorial uses a common use case as an example of how you can use multi-VPC connectivity

to privately connect an Apache Kafka client to an MSK cluster from inside AWS, but outside

VPC of the cluster. This process requires the cross-account user to create a MSK managed VPC

connection and conﬁguration for each client, including required client permissions. The process

also requires the MSK cluster owner to enable PrivateLink connectivity on the MSK cluster and

select authentication schemes to control access to the cluster.

In diﬀerent parts of this tutorial, we choose options that apply to this example. This doesn't mean

that they're the only options that work for setting up an MSK cluster or client instances.

The network conﬁguration for this use case is as follows:

• A cross-account user (Kafka client) and an MSK cluster are in the same AWS network/Region, but

in diﬀerent accounts:

• MSK cluster in Account A

• Kafka client in Account B

• The cross-account user will connect privately to the MSK cluster using IAM auth scheme.

This tutorial assumes that there is a provisioned MSK cluster created with Apache Kafka version

2.7.1 or higher. The MSK cluster must be in an ACTIVE state before beginning the conﬁguration

process. To avoid potential data loss or downtime, clients that will use multi-VPC private

connection to connect to the cluster should use Apache Kafka versions that are compatible with

the cluster.

The following diagram illustrates the architecture of Amazon MSK multi-VPC connectivity

connected to a client in a diﬀerent AWS account.

Multi-VPC private connectivity in a single Region 310

Amazon Managed Streaming for Apache Kafka Developer Guide

Step 1: On the MSK cluster in Account A, turn on multi-VPC connectivity for IAM auth scheme

on the cluster

The MSK cluster owner needs to make conﬁguration settings on the MSK cluster after the cluster is

created and in an ACTIVE state.

The cluster owner turns on multi-VPC private connectivity on the ACTIVE cluster for any auth

schemes that will be active on the cluster. This can be done using the UpdateSecurity API or MSK

console. The IAM, SASL/SCRAM, and TLS auth schemes support multi-VPC private connectivity.

Multi-VPC private connectivity can’t be enabled for unauthenticated clusters.

For this use case, you’ll conﬁgure the cluster to use the IAM auth scheme.

Note

If you are conﬁguring your MSK cluster to use SASL/SCRAM auth scheme, the Apache Kafka

ACLs property "allow.everyone.if.no.acl.found=false" is mandatory. See Apache

Kafka ACLs.

When you update multi-VPC private connectivity settings, Amazon MSK starts a rolling reboot of

broker nodes that updates the broker conﬁgurations. This can take up to 30 minutes or more to

complete. You can’t make other updates to the cluster while connectivity is being updated.

Multi-VPC private connectivity in a single Region 311

Amazon Managed Streaming for Apache Kafka Developer Guide

Turn on multi-VPC for selected auth schemes on the cluster in Account A using the console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/ for the account

where the cluster is located.

2. In the navigation pane, under MSK Clusters, choose Clusters to display the list of clusters in

the account.

3. Select the cluster to conﬁgure for multi-VPC private connectivity. The cluster must be in an

ACTIVE state.

4. Select the cluster Properties tab, and then go to Network settings.

5. Select the Edit drop down menu and select Turn on multi-VPC connectivity.

6. Select one or more authentication types you want turned on for this cluster. For this use case,

select IAM role-based authentication.

7. Select Save changes.

Example - UpdateConnectivity API that turns on Multi-VPC private connectivity auth schemes

on a cluster

As an alternative to the MSK console, you can use the UpdateConnectivity API to turn on multi-

VPC private connectivity and conﬁgure auth schemes on an ACTIVE cluster. The following example

shows the IAM auth scheme turned on for the cluster.

{

"currentVersion": "K3T4TT2Z381HKD",

"connectivityInfo": {

"vpcConnectivity": {

"clientAuthentication": {

"sasl": {

"iam": {

"enabled": TRUE

}

Multi-VPC private connectivity in a single Region 312

Amazon Managed Streaming for Apache Kafka Developer Guide

Amazon MSK creates the networking infrastructure required for private connectivity. Amazon

MSK also creates a new set of bootstrap broker endpoints for each auth type that requires private

connectivity. Note that the plaintext auth scheme does not support multi-VPC private connectivity.

Step 2: Attach a cluster policy to the MSK cluster

The cluster owner can attach a cluster policy (also known as a resource-based policy) to the MSK

cluster where you will turn on multi-VPC private connectivity. The cluster policy gives the clients

permission to access the cluster from another account. Before you can edit the cluster policy, you

need the account ID(s) for the accounts that should have permission to access the MSK cluster. See

How Amazon MSK works with IAM.

The cluster owner must attach a cluster policy to the MSK cluster that authorizes the cross-account

user in Account B to get bootstrap brokers for the cluster and to authorize the following actions on

the MSK cluster in Account A:

• CreateVpcConnection

• GetBootstrapBrokers

• DescribeCluster

• DescribeClusterV2

Example

For reference, the following is an example of the JSON for a basic cluster policy, similar to the

default policy shown in the MSK console IAM policy editor.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"AWS": [

"123456789012"

]

"Action": [

"kafka:CreateVpcConnection",

"kafka:GetBootstrapBrokers",

Multi-VPC private connectivity in a single Region 313

Amazon Managed Streaming for Apache Kafka Developer Guide

"kafka:DescribeCluster",

"kafka:DescribeClusterV2"

"Resource": "arn:aws:kafka:us-east-1:123456789012:cluster/testing/

de8982fa-8222-4e87-8b20-9bf3cdfa1521-2"

}

]

}

Attach a cluster policy to the MSK cluster

1. In the Amazon MSK console, under MSK Clusters, choose Clusters.

2. Scroll down to Security settings and select Edit cluster policy.

3. In the console, on the Edit Cluster Policy screen, select Basic policy for multi-VPC

connectivity.

4. In the Account ID ﬁeld, enter the account ID for each account that should have permission to

access this cluster. As you type the ID, it is automatically copied over into the displayed policy

JSON syntax. In our example cluster policy, the Account ID is 123456789012.

5. Select Save changes.

For information about cluster policy APIs, see Amazon MSK resource-based policies.

Step 3: Cross-account user actions to conﬁgure client-managed VPC connections

To set up multi-VPC private connectivity between a client in a diﬀerent account from the MSK

cluster, the cross-account user creates a managed VPC connection for the client. Multiple clients

can be connected to the MSK cluster by repeating this procedure. For the purposes of this use case,

you’ll conﬁgure just one client.

Clients can use the supported auth schemes IAM, SASL/SCRAM, or TLS. Each managed VPC

connection can have only one auth scheme associated with it. The client auth scheme must be

conﬁgured on the MSK cluster where the client will connect.

For this use case, conﬁgure the client auth scheme so that the client in Account B uses the IAM auth

scheme.

Prerequisites

This process requires the following items:

Multi-VPC private connectivity in a single Region 314

Amazon Managed Streaming for Apache Kafka Developer Guide

• The previously created cluster policy that grants the client in Account B permission to perform

actions on the MSK cluster in Account A.

• An identity policy attached to the client in Account B that grants permissions for

kafka:CreateVpcConnection, ec2:CreateTags, ec2:CreateVPCEndpoint and

ec2:DescribeVpcAttribute action.

Example

For reference, the following is an example of the JSON for a basic client identity policy.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"kafka:CreateVpcConnection",

"ec2:CreateTags",

"ec2:CreateVPCEndpoint",

"ec2:DescribeVpcAttribute"

"Resource": "*"

}

]

}

To create a managed VPC connection for a client in Account B

1. From the cluster administrator, get the Cluster ARN of the MSK cluster in Account A that you

want the client in Account B to connect to. Make note of the cluster ARN to use later.

2. In the MSK console for the client Account B, choose Managed VPC connections, and then

choose Create connection.

3. In the Connection settings pane, paste the cluster ARN into the cluster ARN text ﬁeld, and

then choose Verify.

4. Select the Authentication type for the client in Account B. For this use case, choose IAM when

creating the client VPC connection.

5. Choose the VPC for the client.

Multi-VPC private connectivity in a single Region 315

Amazon Managed Streaming for Apache Kafka Developer Guide

6. Choose at least two availability Zones and associated Subnets. You can get the availability

zone IDs from the AWS Management Console cluster details or by using the DescribeCluster

API or the describe-cluster AWS CLI command. The zone IDs that you specify for the client

subnet must match those of the cluster subnet. If the values for a subnet are missing, ﬁrst

create a subnet with the same zone ID as your MSK cluster.

7. Choose a Security group for this VPC connection. You can use the default security group.

For more information on conﬁguring a security group, see Control traﬃc to resources using

security groups.

8. Select Create connection.

9. To get the list of new bootstrap broker strings from the cross-account user’s MSK console

(Cluster details > Managed VPC connection), see the bootstrap broker strings shown under

“Cluster connection string.” From the client Account B, the list of bootstrap brokers can be

viewed by calling the GetBootstrapBrokers API or by viewing the list of bootstrap brokers in

the console cluster details.

10. Update the security groups associated with the VPC connections as follows:

a. Set inbound rules for the PrivateLink VPC to allow all traﬃc for the IP range from the

Account B network.

b. [Optional] Set Outbound rules connectivity to the MSK cluster. Choose the Security

Group in the VPC console, Edit Outbound Rules, and add a rule for Custom TCP Traﬃc

for port ranges 14001-14100. The multi-VPC network load balancer is listening on the

14001-14100 port ranges. See Network Load Balancers.

11. Conﬁgure the client in Account B to use the new bootstrap brokers for multi-VPC private

connectivity to connect to the MSK cluster in Account A. See Produce and consume data.

After authorization is complete, Amazon MSK creates a managed VPC connection for each speciﬁed

VPC and auth scheme. The chosen security group is associated with each connection. This managed

VPC connection is conﬁgured by Amazon MSK to connect privately to the brokers. You can use the

new set of bootstrap brokers to connect privately to the Amazon MSK cluster.

Update the authorization schemes on a cluster

Multi-VPC private connectivity supports several authorization schemes: SASL/SCRAM, IAM, and

TLS. The cluster owner can turn on/oﬀ private connectivity for one or more auth schemes. The

cluster has to be in ACTIVE state to perform this action.

Multi-VPC private connectivity in a single Region 316

Amazon Managed Streaming for Apache Kafka Developer Guide

To turn on an auth scheme using the Amazon MSK console

1. Open the Amazon MSK console at AWS Management Console for the cluster that you want to

edit.

2. In the navigation pane, under MSK Clusters, choose Clusters to display the list of clusters in

the account.

3. Select the cluster that you want to edit. The cluster must be in an ACTIVE state.

4. Select the cluster Properties tab, and then go to Network settings.

5. Select the Edit dropdown menu and select Turn on multi-VPC connectivity to turn on a new

auth scheme.

6. Select one or more authentication types that you want turned on for this cluster.

7. Select Turn on selection.

When you turn on a new auth scheme, you should also create new managed VPC connections for

the new auth scheme and update your clients to use the bootstrap brokers speciﬁc to the new auth

scheme.

To turn oﬀ an auth scheme using the Amazon MSK console

Note

When you turn oﬀ multi-VPC private connectivity for auth schemes, all connectivity related

infrastructure, including the managed VPC connections, are deleted.

When you turn oﬀ multi-VPC private connectivity for auth schemes, existing VPC connections on

client side change to INACTIVE, and Privatelink infrastructure on the cluster side, including the

managed VPC connections, on the cluster side is removed. The cross-account user can only delete

the inactive VPC connection. If private connectivity is turned on again on the cluster, the cross-

account user needs to create a new connection to the cluster.

1. Open the Amazon MSK console at AWS Management Console.

2. In the navigation pane, under MSK Clusters, choose Clusters to display the list of clusters in

the account.

3. Select the cluster you want to edit. The cluster must be in an ACTIVE state.

4. Select the cluster Properties tab, then go to Network settings.

Multi-VPC private connectivity in a single Region 317

Amazon Managed Streaming for Apache Kafka Developer Guide

5. Select the Edit drop down menu and select Turn oﬀ multi-VPC connectivity (to turn oﬀ an

auth scheme).

6. Select one or more authentication types you want turned oﬀ for this cluster.

7. Select Turn oﬀ selection.

Example To turn on/oﬀ an auth scheme with the API

As an alternative to the MSK console, you can use the UpdateConnectivity API to turn on multi-

VPC private connectivity and conﬁgure auth schemes on an ACTIVE cluster. The following example

shows SASL/SCRAM and IAM auth schemes turned on for the cluster.

When you turn on a new auth scheme, you should also create new managed VPC connections for

the new auth scheme and update your clients to use the bootstrap brokers speciﬁc to the new auth

scheme.

When you turn oﬀ multi-VPC private connectivity for auth schemes, existing VPC connections on

client side change to INACTIVE, and Privatelink infrastructure on the cluster side, including the

managed VPC connections, is removed. The cross-account user can only delete the inactive VPC

connection. If private connectivity is turned on again on the cluster, the cross-account user needs

to create a new connection to the cluster.

Request:

{

"currentVersion": "string",

"connnectivityInfo": {

"publicAccess": {

"type": "string"

"vpcConnectivity": {

"clientAuthentication": {

"sasl": {

"scram": {

"enabled": TRUE

"iam": {

"enabled": TRUE

}

"tls": {

"enabled": FALSE

}

Multi-VPC private connectivity in a single Region 318

Amazon Managed Streaming for Apache Kafka Developer Guide

}

Response:

{

"clusterArn": "string",

"clusterOperationArn": "string"

}

Reject a managed VPC connection to an Amazon MSK cluster

From the Amazon MSK console on the cluster admin account, you can reject a client VPC

connection. The client VPC connection must be in the AVAILABLE state to be rejected. You might

want to reject a managed VPC connection from a client that is no longer authorized to connect to

your cluster. To prevent new managed VPC connections from a connecting to a client, deny access

to the client in the cluster policy. A rejected connection still incurs cost until its deleted by the

connection owner. See Delete a managed VPC connection to an Amazon MSK cluster .

To reject a client VPC connection using the MSK console

1. Open the Amazon MSK console at AWS Management Console.

2. In the navigation pane, select Clusters and scroll to the Network settings > Client VPC

connections list.

3. Select the connection that you want to reject and select Reject client VPC connection.

4. Conﬁrm that you want to reject the selected client VPC connection.

To reject a managed VPC connection using the API, use the RejectClientVpcConnection API.

Delete a managed VPC connection to an Amazon MSK cluster

The cross-account user can delete a managed VPC connection for an MSK cluster from the client

account console. Because the cluster owner user doesn’t own the managed VPC connection, the

connection can’t be deleted from the cluster admin account. Once a VPC connection is deleted, it

no longer incurs cost.

To delete a managed VPC connection using the MSK console

1. From the client account, open the Amazon MSK console at AWS Management Console.

Multi-VPC private connectivity in a single Region 319

Amazon Managed Streaming for Apache Kafka Developer Guide

2. In the navigation pane, select Managed VPC connections.

3. From the connection list, select the connection that you want to delete.

4. Conﬁrm that you want to delete the VPC connection.

To delete a managed VPC connection using the API, use the DeleteVpcConnection API.

Permissions for multi-VPC private connectivity

This section summarizes the permissions needed for clients and clusters using the multi-VPC

private connectivity feature. Multi-VPC private connectivity requires the client admin to create

permissions on each client that will have a managed VPC connection to the MSK cluster. It also

requires the MSK cluster admin to enable PrivateLink connectivity on the MSK cluster and select

authentication schemes to control access to the cluster.

Cluster auth type and topic access permissions

Turn on the multi-VPC private connectivity feature for auth schemes that are enabled for

your MSK cluster. See Requirements and limitations for multi-VPC private connectivity. If you

are conﬁguring your MSK cluster to use SASL/SCRAM auth scheme, the Apache Kafka ACLs

property allow.everyone.if.no.acl.found=false is mandatory. After you set the

Apache Kafka ACLs for your cluster, update the cluster's conﬁguration to have the property

allow.everyone.if.no.acl.found set to false for the cluster. For information about how to

update the conﬁguration of a cluster, see Amazon MSK conﬁguration operations.

Cross-account cluster policy permissions

If a Kafka client is in an AWS account that is diﬀerent than the MSK cluster, attach a cluster-

based policy to the MSK cluster that authorizes the client root user for cross-account connectivity.

You can edit the multi-VPC cluster policy using the IAM policy editor in the MSK console (cluster

Security settings > Edit cluster policy), or use the following APIs to manage the cluster policy:

PutClusterPolicy

Attaches the cluster policy to the cluster. You can use this API to create or update the speciﬁed

MSK cluster policy. If you’re updating the policy, the currentVersion ﬁeld is required in the

request payload.

GetClusterPolicy

Retrieves the JSON text of the cluster policy document attached to the cluster.

Multi-VPC private connectivity in a single Region 320

Amazon Managed Streaming for Apache Kafka Developer Guide

DeleteClusterPolicy

Deletes the cluster policy.

The following is an example of the JSON for a basic cluster policy, similar to the one shown in the

MSK console IAM policy editor.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"AWS": [

"123456789012"

]

"Action": [

"kafka:CreateVpcConnection",

"kafka:GetBootstrapBrokers",

"kafka:DescribeCluster",

"kafka:DescribeClusterV2"

"Resource": "arn:aws:kafka:us-east-1:123456789012:cluster/testing/

de8982fa-8222-4e87-8b20-9bf3cdfa1521-2"

}

]

}

Client permissions for multi-VPC private connectivity to an MSK cluster

To set up multi-VPC private connectivity between a Kafka client and an MSK cluster, the client

requires an attached identity policy that grants permissions for kafka:CreateVpcConnection,

ec2:CreateTags and ec2:CreateVPCEndpoint actions on the client. For reference, the

following is an example of the JSON for a basic client identity policy.

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

Multi-VPC private connectivity in a single Region 321

Amazon Managed Streaming for Apache Kafka Developer Guide

"kafka:CreateVpcConnection",

"ec2:CreateTags",

"ec2:CreateVPCEndpoint"

"Resource": "*"

}

]

}

Port information

Use the following port numbers so that Amazon MSK can communicate with client machines:

• To communicate with brokers in plaintext, use port 9092.

• To communicate with brokers with TLS encryption, use port 9094 for access from within AWS

and port 9194 for public access.

• To communicate with brokers with SASL/SCRAM, use port 9096 for access from within AWS and

port 9196 for public access.

• To communicate with brokers in a cluster that is set up to use the section called “IAM access

control”, use port 9098 for access from within AWS and port 9198 for public access.

• To communicate with Apache ZooKeeper by using TLS encryption, use port 2182. Apache

ZooKeeper nodes use port 2181 by default.

Port information 322

Amazon Managed Streaming for Apache Kafka Developer Guide

Migrate to an Amazon MSK Cluster

Amazon MSK Replicator can be used for MSK cluster migration. See What is Amazon MSK

Replicator?. Alternatively, you can use Apache MirrorMaker 2.0 to migrate from a non-MSK cluster

to an Amazon MSK cluster. For an example of how to do this, see Migrate an on-premises Apache

Kafka cluster to Amazon MSK by using MirrorMaker. For information about how to use MirrorMaker,

see Mirroring data between clusters in the Apache Kafka documentation. We recommend setting

up MirrorMaker in a highly available conﬁguration.

An outline of the steps to follow when using MirrorMaker to migrate to an MSK cluster

1. Create the destination MSK cluster

2. Start MirrorMaker from an Amazon EC2 instance within the same Amazon VPC as the

destination cluster.

3. Inspect the MirrorMaker lag.

4. After MirrorMaker catches up, redirect producers and consumers to the new cluster using the

MSK cluster bootstrap brokers.

5. Shut down MirrorMaker.

Migrate your Apache Kafka cluster to Amazon MSK

Suppose that you have an Apache Kafka cluster named CLUSTER_ONPREM. That cluster is

populated with topics and data. If you want to migrate that cluster to a newly created Amazon

MSK cluster named CLUSTER_AWSMSK, this procedure provides a high-level view of the steps that

you need to follow.

To migrate your existing Apache Kafka cluster to Amazon MSK

In CLUSTER_AWSMSK, create all the topics that you want to migrate.

You can't use MirrorMaker for this step because it doesn't automatically re-create the

topics that you want to migrate with the right replication level. You can create the topics

in Amazon MSK with the same replication factors and numbers of partitions that they had

in CLUSTER_ONPREM. You can also create the topics with diﬀerent replication factors and

numbers of partitions.

Migrate your Apache Kafka cluster to Amazon MSK 323

Amazon Managed Streaming for Apache Kafka Developer Guide

Start MirrorMaker from an instance that has read access to CLUSTER_ONPREM and write access

to CLUSTER_AWSMSK.

3. Run the following command to mirror all topics:

<path-to-your-kafka-installation>/bin/kafka-mirror-maker.sh --consumer.config

config/mirrormaker-consumer.properties --producer.config config/mirrormaker-

producer.properties --whitelist '.*'

In this command, config/mirrormaker-consumer.properties

points to a bootstrap broker in CLUSTER_ONPREM; for example,

bootstrap.servers=localhost:9092. And config/mirrormaker-

producer.properties points to a bootstrap broker in CLUSTER_AWSMSK; for example,

bootstrap.servers=10.0.0.237:9092,10.0.2.196:9092,10.0.1.233:9092.

Keep MirrorMaker running in the background, and continue to use CLUSTER_ONPREM.

MirrorMaker mirrors all new data.

5. Check the progress of mirroring by inspecting the lag between the last oﬀset for each topic

and the current oﬀset from which MirrorMaker is consuming.

Remember that MirrorMaker is simply using a consumer and a producer. So, you can check the

lag using the kafka-consumer-groups.sh tool. To ﬁnd the consumer group name, look

inside the mirrormaker-consumer.properties ﬁle for the group.id, and use its value. If

there is no such key in the ﬁle, you can create it. For example, set group.id=mirrormaker-

consumer-group.

6. After MirrorMaker ﬁnishes mirroring all topics, stop all producers and consumers, and then

stop MirrorMaker. Then redirect the producers and consumers to the CLUSTER_AWSMSK cluster

by changing their producer and consumer bootstrap brokers values. Restart all producers and

consumers on CLUSTER_AWSMSK.

Migrate from one Amazon MSK cluster to another

You can use Apache MirrorMaker 2.0 to migrate from a non-MSK cluster to a MSK cluster. For

example, you can migrate from one version of Apache Kafka to another. For an example of how to

do this, see Migrate an on-premises Apache Kafka cluster to Amazon MSK by using MirrorMaker.

Alternatively, Amazon MSK Replicator can be used for MSK cluster migration. For more information

about Amazon MSK Replicator, see What is Amazon MSK Replicator?.

Migrate from one Amazon MSK cluster to another 324

Amazon Managed Streaming for Apache Kafka Developer Guide

MirrorMaker 1.0 best practices

This list of best practices applies to MirrorMaker 1.0.

• Run MirrorMaker on the destination cluster. This way, if a network problem happens, the

messages are still available in the source cluster. If you run MirrorMaker on the source cluster and

events are buﬀered in the producer and there is a network issue, events might be lost.

• If encryption is required in transit, run it in the source cluster.

• For consumers, set auto.commit.enabled=false

• For producers, set

• max.in.ﬂight.requests.per.connection=1

• retries=Int.Max_Value

• acks=all

• max.block.ms = Long.Max_Value

• For a high producer throughput:

• Buﬀer messages and ﬁll message batches — tune buﬀer.memory, batch.size, linger.ms

• Tune socket buﬀers — receive.buﬀer.bytes, send.buﬀer.bytes

• To avoid data loss, turn oﬀ auto commit at the source, so that MirrorMaker can control the

commits, which it typically does after it receives the ack from the destination cluster. If the

producer has acks=all and the destination cluster has min.insync.replicas set to more than 1,

the messages are persisted on more than one broker at the destination before the MirrorMaker

consumer commits the oﬀset at the source.

• If order is important, you can set retries to 0. Alternatively, for a production environment, set

max inﬂight connections to 1 to ensure that the batches sent out are not committed out of order

if a batch fails in the middle. This way, each batch sent is retried until the next batch is sent out.

If max.block.ms is not set to the maximum value, and if the producer buﬀer is full, there can

be data loss (depending on some of the other settings). This can block and back-pressure the

consumer.

• For high throughput

• Increase buﬀer.memory.

• Increase batch size.

• Tune linger.ms to allow the batches to ﬁll. This also allows for better compression, less

network bandwidth usage, and less storage on the cluster. This results in increased retention.

MirrorMaker 1.0 best practices 325

Amazon Managed Streaming for Apache Kafka Developer Guide

• Monitor CPU and memory usage.

• For high consumer throughput

• Increase the number of threads/consumers per MirrorMaker process — num.streams.

• Increase the number of MirrorMaker processes across machines ﬁrst before increasing threads

to allow for high availability.

• Increase the number of MirrorMaker processes ﬁrst on the same machine and then on diﬀerent

machines (with the same group ID).

• Isolate topics that have very high throughput and use separate MirrorMaker instances.

• For management and conﬁguration

• Use AWS CloudFormation and conﬁguration management tools like Chef and Ansible.

• Use Amazon EFS mounts to keep all conﬁguration ﬁles accessible from all Amazon EC2

instances.

• Use containers for easy scaling and management of MirrorMaker instances.

• Typically, it takes more than one consumer to saturate a producer in MirrorMaker. So, set up

multiple consumers. First, set them up on diﬀerent machines to provide high availability. Then,

scale individual machines up to having a consumer for each partition, with consumers equally

distributed across machines.

• For high throughput ingestion and delivery, tune the receive and send buﬀers because their

defaults might be too low. For maximum performance, ensure that the total number of streams

(num.streams) matches all of the topic partitions that MirrorMaker is trying to copy to the

destination cluster.

Advantages of MirrorMaker 2.*

• Makes use of the Apache Kafka Connect framework and ecosystem.

• Detects new topics and partitions.

• Automatically syncs topic conﬁguration between clusters.

• Supports "active/active" cluster pairs, as well as any number of active clusters.

• Provides new metrics including end-to-end replication latency across multiple data centers and

clusters.

• Emits oﬀsets required to migrate consumers between clusters and provides tooling for oﬀset

translation.

Advantages of MirrorMaker 2.* 326

Amazon Managed Streaming for Apache Kafka Developer Guide

• Supports a high-level conﬁguration ﬁle for specifying multiple clusters and replication ﬂows

in one place, compared to low-level producer/consumer properties for each MirrorMaker 1.*

process.

Advantages of MirrorMaker 2.* 327

Amazon Managed Streaming for Apache Kafka Developer Guide

Monitor an Amazon MSK cluster

There are several ways that Amazon MSK helps you monitor the status of your Amazon MSK

cluster.

• Amazon MSK helps you monitor your disk storage capacity by automatically sending you storage

capacity alerts when a cluster is about to reach its storage capacity limit. The alerts also provide

recommendations on the best steps to take to address detected issues. This helps you to identify

and quickly resolve disk capacity issues before they become critical. Amazon MSK automatically

sends these alerts to the Amazon MSK console, AWS Health Dashboard, Amazon EventBridge,

and email contacts for your AWS account. For information about storage capacity alerts, see Use

Amazon MSK storage capacity alerts.

• Amazon MSK gathers Apache Kafka metrics and sends them to Amazon CloudWatch where

you can view them. For more information about Apache Kafka metrics, including the ones that

Amazon MSK surfaces, see Monitoring in the Apache Kafka documentation.

• You can also monitor your MSK cluster with Prometheus, an open-source monitoring application.

For information about Prometheus, see Overview in the Prometheus documentation. To learn

how to monitor your cluster with Prometheus, see the section called “Monitor with Prometheus”.

Topics

• Amazon MSK metrics for monitoring with CloudWatch

• View Amazon MSK metrics using CloudWatch

• Monitor consumer lags

• Monitor MSK cluster with Prometheus

• Use Amazon MSK storage capacity alerts

Amazon MSK metrics for monitoring with CloudWatch

Amazon MSK integrates with Amazon CloudWatch so that you can collect, view, and analyze

CloudWatch metrics for your Amazon MSK cluster. The metrics that you conﬁgure for your MSK

cluster are automatically collected and pushed to CloudWatch at 1 minute intervals. You can

set the monitoring level for an MSK cluster to one of the following: DEFAULT, PER_BROKER,

PER_TOPIC_PER_BROKER, or PER_TOPIC_PER_PARTITION. The tables in the following sections

show all the metrics that are available starting at each monitoring level.

Metrics for monitoring with CloudWatch 328

Amazon Managed Streaming for Apache Kafka Developer Guide

Note

The names of some Amazon MSK metrics for CloudWatch monitoring have changed in

version 3.6.0 and higher. Use the new names for monitoring these metrics. For metrics with

changed names, the table below shows the name used in version 3.6.0 and higher, followed

by the name in version 2.8.2.tiered.

DEFAULT-level metrics are free. Pricing for other metrics is described in the Amazon CloudWatch

pricing page.

DEFAULT Level monitoring

The metrics described in the following table are available at the DEFAULT monitoring level. They

are free.

Metrics available at the DEFAULT monitoring level

Name When visible Dimension

Description

ActiveCon

trollerCount

After the cluster gets

to the ACTIVE state.

Cluster

Name

Only one controller per cluster

should be active at any given time.

BurstBalance

After the cluster gets

to the ACTIVE state.

Cluster

Name ,

Broker

The remaining balance of input-out

put burst credits for EBS volumes

in the cluster. Use it to investigate

latency or decreased throughput.

BurstBalance is not reported

for EBS volumes when the baseline

performance of a volume is higher

than the maximum burst performan

ce. For more information, see I/O

Credits and burst performance.

BytesInPerSec

After you create a

topic.

Cluster

Name,

Broker

The number of bytes per second

received from clients. This metric

DEFAULT Level monitoring

329

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Dimension

Description

ID,

Topic

is available per broker and also per

topic.

BytesOutPerSec

After you create a

topic.

Cluster

Name,

Broker

ID,

Topic

The number of bytes per second

sent to clients. This metric is

available per broker and also per

topic.

ClientCon

nectionCount

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

ID,

Client

Authentic

ation

The number of active authenticated

client connections.

Connectio

nCount

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of active authenticated,

unauthenticated, and inter-broker

connections.

CPUCredit

Balance

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of earned CPU credits

that a broker has accrued since it

was launched. Credits are accrued

in the credit balance after they are

earned, and removed from the credit

balance when they are spent. If you

run out of the CPU credit balance,

it can have a negative impact on

your cluster's performance. You can

take steps to reduce CPU load. For

example, you can reduce the number

of client requests or update the

broker type to an M5 broker type.

DEFAULT Level monitoring

330

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Dimension

Description

CpuIdle

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The percentage of CPU idle time.

CpuIoWait

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The percentage of CPU idle time

during a pending disk operation.

CpuSystem

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The percentage of CPU in kernel

space.

CpuUser

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The percentage of CPU in user space.

GlobalPar

titionCount

After the cluster gets

to the ACTIVE state.

Cluster

Name

The number of partitions across

all topics in the cluster, excluding

replicas. Because GlobalPar

titionCount doesn't include

replicas, the sum of the Partition

Count values can be higher than

GlobalPartitionCount if the replicati

on factor for a topic is greater than

GlobalTop

icCount

After the cluster gets

to the ACTIVE state.

Cluster

Name

Total number of topics across all

brokers in the cluster.

DEFAULT Level monitoring

331

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Dimension

Description

Estimated

MaxTimeLag

After consumer group

consumes from a

topic.

Consumer

Group,

Topic

Time estimate (in seconds) to drain

MaxOffsetLag .

KafkaAppL

ogsDiskUsed

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The percentage of disk space used

for application logs.

KafkaData

LogsDiskU

sed (Cluster

Name, Broker ID

dimension)

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The percentage of disk space used

for data logs.

KafkaData

LogsDiskUsed

(Cluster Name

dimension)

After the cluster gets

to the ACTIVE state.

Cluster

Name

The percentage of disk space used

for data logs.

LeaderCount

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The total number of leaders of

partitions per broker, not including

replicas.

MaxOffsetLag

After consumer group

consumes from a

topic.

Consumer

Group,

Topic

The maximum oﬀset lag across all

partitions in a topic.

MemoryBuffered

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The size in bytes of buﬀered

memory for the broker.

DEFAULT Level monitoring

332

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Dimension

Description

MemoryCached

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The size in bytes of cached memory

for the broker.

MemoryFree

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The size in bytes of memory that is

free and available for the broker.

HeapMemor

yAfterGC

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The percentage of total heap

memory in use after garbage

collection.

MemoryUsed

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The size in bytes of memory that is

in use for the broker.

MessagesI

nPerSec

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of incoming messages

per second for the broker.

NetworkRx

Dropped

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of dropped receive

packages.

NetworkRx

Errors

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of network receive

errors for the broker.

DEFAULT Level monitoring

333

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Dimension

Description

NetworkRx

Packets

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of packets received by

the broker.

NetworkTx

Dropped

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of dropped transmit

packages.

NetworkTx

Errors

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of network transmit

errors for the broker.

NetworkTx

Packets

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of packets transmitted

by the broker.

OfflinePa

rtitionsCount

After the cluster gets

to the ACTIVE state.

Cluster

Name

Total number of partitions that are

oﬄine in the cluster.

PartitionCount

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The total number of topic partitions

per broker, including replicas.

ProduceTo

talTimeMsMean

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The mean produce time in milliseco

nds.

DEFAULT Level monitoring

334

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Dimension

Description

RequestBy

tesMean

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The mean number of request bytes

for the broker.

RequestTime

After request throttlin

g is applied.

Cluster

Name,

Broker

The average time in milliseconds

spent in broker network and I/O

threads to process requests.

RootDiskUsed

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The percentage of the root disk used

by the broker.

SumOffsetLag

After consumer group

consumes from a

topic.

Consumer

Group,

Topic

The aggregated oﬀset lag for all the

partitions in a topic.

SwapFree

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The size in bytes of swap memory

that is available for the broker.

SwapUsed

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The size in bytes of swap memory

that is in use for the broker.

TrafficShaping

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

High-level metrics indicating the

number of packets shaped (dropped

or queued) due to exceeding

network allocations. Finer detail is

available with PER_BROKER metrics.

DEFAULT Level monitoring

335

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Dimension

Description

UnderMinI

srPartiti

onCount

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of under minIsr

partitions for the broker.

UnderRepl

icatedPar

titions

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

The number of under-replicated

partitions for the broker.

ZooKeeper

RequestLa

tencyMsMean

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

For ZooKeeper-based cluster. The

mean latency in milliseconds for

Apache ZooKeeper requests from

broker.

ZooKeeper

SessionState

After the cluster gets

to the ACTIVE state.

Cluster

Name,

Broker

For ZooKeeper-based cluster.

Connection status of broker's

ZooKeeper session which may be

one of the following: NOT_CONNE

CTED: '0.0', ASSOCIATING: '0.1',

CONNECTING: '0.5', CONNECTED

READONLY: '0.8', CONNECTED: '1.0',

CLOSED: '5.0', AUTH_FAILED: '10.0'.

PER_BROKER Level monitoring

When you set the monitoring level to PER_BROKER, you get the metrics described in the following

table in addition to all the DEFAULT level metrics. You pay for the metrics in the following table,

whereas the DEFAULT level metrics continue to be free. The metrics in this table have the following

dimensions: Cluster Name, Broker ID.

PER_BROKER Level monitoring

336

Amazon Managed Streaming for Apache Kafka Developer Guide

Additional metrics that are available starting at the PER_BROKER monitoring level

Name When visible Description

BwInAllowanceExceeded

After the cluster gets

to the ACTIVE state.

The number of packets shaped

because the inbound aggregate

bandwidth exceeded the maximum

for the broker.

BwOutAllowanceExce

eded

After the cluster gets

to the ACTIVE state.

The number of packets shaped

because the outbound aggregate

bandwidth exceeded the maximum

for the broker.

ConnTrackAllowance

Exceeded

After the cluster gets

to the ACTIVE state.

The number of packets shaped

because the connection tracking

exceeded the maximum for the

broker. Connection tracking is related

to security groups that track each

connection established to ensure

that return packets are delivered as

expected.

ConnectionCloseRate

After the cluster gets

to the ACTIVE state.

The number of connections closed per

second per listener. This number is

aggregated per listener and ﬁltered

for the client listeners.

ConnectionCreation

Rate

After the cluster gets

to the ACTIVE state.

The number of new connections

established per second per listener.

This number is aggregated per

listener and ﬁltered for the client

listeners.

CpuCreditUsage

After the cluster gets

to the ACTIVE state.

The number of CPU credits spent by

the broker. If you run out of the CPU

credit balance, it can have a negative

impact on your cluter's performan

ce. You can take steps to reduce CPU

PER_BROKER Level monitoring

337

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Description

load. For example, you can reduce the

number of client requests or update

the broker type to an M5 broker type.

FetchConsumerLocal

TimeMsMean

After there's a

producer/consumer.

The mean time in milliseconds that

the consumer request is processed at

the leader.

FetchConsumerReque

stQueueTimeMsMean

After there's a

producer/consumer.

The mean time in milliseconds that

the consumer request waits in the

request queue.

FetchConsumerRespo

nseQueueTimeMsMean

After there's a

producer/consumer.

The mean time in milliseconds that

the consumer request waits in the

response queue.

FetchConsumerRespo

nseSendTimeMsMean

After there's a

producer/consumer.

The mean time in milliseconds for the

consumer to send a response.

FetchConsumerTotal

TimeMsMean

After there's a

producer/consumer.

The mean total time in milliseconds

that consumers spend on fetching

data from the broker.

FetchFollowerLocal

TimeMsMean

After there's a

producer/consumer.

The mean time in milliseconds that

the follower request is processed at

the leader.

FetchFollowerReque

stQueueTimeMsMean

After there's a

producer/consumer.

The mean time in milliseconds that

the follower request waits in the

request queue.

FetchFollowerRespo

nseQueueTimeMsMean

After there's a

producer/consumer.

The mean time in milliseconds that

the follower request waits in the

response queue.

FetchFollowerRespo

nseSendTimeMsMean

After there's a

producer/consumer.

The mean time in milliseconds for the

follower to send a response.

PER_BROKER Level monitoring

338

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Description

FetchFollowerTotal

TimeMsMean

After there's a

producer/consumer.

The mean total time in milliseconds

that followers spend on fetching data

from the broker.

FetchMessageConver

sionsPerSec

After you create a

topic.

The number of fetch message

conversions per second for the broker.

FetchThrottleByteRate

After bandwidth

throttling is applied.

The number of throttled bytes per

second.

FetchThrottleQueue

Size

After bandwidth

throttling is applied.

The number of messages in the

throttle queue.

FetchThrottleTime

After bandwidth

throttling is applied.

The average fetch throttle time in

milliseconds.

IAMNumberOfConnect

ionRequests

After the cluster gets

to the ACTIVE state.

The number of IAM authentication

requests per second.

IAMTooManyConnections

After the cluster gets

to the ACTIVE state.

The number of connections

attempted beyond 100. 0 means the

number of connections is within the

limit. If >0, the throttle limit is being

exceeded and you need to reduce

number of connections.

NetworkProcessorAv

gIdlePercent

After the cluster gets

to the ACTIVE state.

The average percentage of the time

the network processors are idle.

PpsAllowanceExceeded

After the cluster gets

to the ACTIVE state.

The number of packets shaped

because the bidirectional PPS

exceeded the maximum for the

broker.

ProduceLocalTimeMs

Mean

After the cluster gets

to the ACTIVE state.

The mean time in milliseconds that

the request is processed at the leader.

PER_BROKER Level monitoring

339

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Description

ProduceMessageConv

ersionsPerSec

After you create a

topic.

The number of produce message

conversions per second for the broker.

ProduceMessageConv

ersionsTimeMsMean

After the cluster gets

to the ACTIVE state.

The mean time in milliseconds spent

on message format conversions.

ProduceRequestQueu

eTimeMsMean

After the cluster gets

to the ACTIVE state.

The mean time in milliseconds that

request messages spend in the queue.

ProduceResponseQue

ueTimeMsMean

After the cluster gets

to the ACTIVE state.

The mean time in milliseconds that

response messages spend in the

queue.

ProduceResponseSen

dTimeMsMean

After the cluster gets

to the ACTIVE state.

The mean time in milliseconds spent

on sending response messages.

ProduceThrottleByt

eRate

After bandwidth

throttling is applied.

The number of throttled bytes per

second.

ProduceThrottleQue

ueSize

After bandwidth

throttling is applied.

The number of messages in the

throttle queue.

ProduceThrottleTime

After bandwidth

throttling is applied.

The average produce throttle time in

milliseconds.

ProduceTotalTimeMs

Mean

After the cluster gets

to the ACTIVE state.

The mean produce time in milliseco

nds.

RemoteFetchBytesPe

rSec (RemoteBy

tesInPerSec in

v2.8.2.tiered)

After there’s a

producer/consumer.

The total number of bytes transferr

ed from tiered storage in response

to consumer fetches. This metric

includes all topic-partitions that

contribute to downstream data

transfer traﬃc. Category: Traﬃc and

error rates. This is a KIP-405 metric.

PER_BROKER Level monitoring

340

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Description

RemoteCopyBytesPerSec

(RemoteBytesOutPerSec

in v2.8.2.tiered)

After there’s a

producer/consumer.

The total number of bytes transferred

to tiered storage, including data from

log segments, indexes, and other

auxiliary ﬁles. This metric includes

all topic-partitions that contribut

e to upstream data transfer traﬃc.

Category: Traﬃc and error rates. This

is a KIP-405 metric.

RemoteLogManagerTa

sksAvgIdlePercent

After the cluster gets

to the ACTIVE state.

The average percentage of time the

remote log manager spent idle. The

remote log manager transfers data

from the broker to tiered storage.

Category: Internal activity. This is a

KIP-405 metric.

RemoteLogReaderAvg

IdlePercent

After the cluster gets

to the ACTIVE state.

The average percentage of time the

remote log reader spent idle. The

remote log reader transfers data from

the remote storage to the broker

in response to consumer fetches.

Category: Internal activity. This is a

KIP-405 metric.

RemoteLogReaderTas

kQueueSize

After the cluster gets

to the ACTIVE state.

The number of tasks responsible for

reads from tiered storage that are

waiting to be scheduled. Category:

Internal activity. This is a KIP-405

metric.

PER_BROKER Level monitoring

341

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Description

RemoteFetchErrorsP

erSec (RemoteRe

adErrorPerSec in

v2.8.2.tiered)

After the cluster gets

to the ACTIVE state.

The total rate of errors in response

to read requests that the speciﬁed

broker sent to tiered storage to

retrieve data in response to consumer

fetches. This metric includes all

topic partitions that contribute to

downstream data transfer traﬃc.

Category: traﬃc and error rates. This

is a KIP-405 metric.

RemoteFetchRequest

sPerSec (RemoteRe

adRequestsPerSec in

v2.8.2.tiered)

After the cluster gets

to the ACTIVE state.

The total number of read requests

that the speciﬁes broker sent to

tiered storage to retrieve data in

response to consumer fetches. This

metric includes all topic partitions

which contribute to downstream data

transfer traﬃc. Category: traﬃc and

error rates. This is a KIP-405 metric.

RemoteCopyErrorsPe

rSec (RemoteWr

iteErrorPerSec in

v2.8.2.tiered)

After the cluster gets

to the ACTIVE state.

The total rate of errors in response

to write requests that the speciﬁed

broker sent to tiered storage to

transfer data upstream. This metric

includes all topic partitions that

contribute to upstream data transfer

traﬃc. Category: traﬃc and error

rates. This is a KIP-405 metric.

ReplicationBytesIn

PerSec

After you create a

topic.

The number of bytes per second

received from other brokers.

ReplicationBytesOu

tPerSec

After you create a

topic.

The number of bytes per second sent

to other brokers.

PER_BROKER Level monitoring

342

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Description

RequestExemptFromT

hrottleTime

After request

throttling is applied.

The average time in milliseconds

spent in broker network and I/O

threads to process requests that are

exempt from throttling.

RequestHandlerAvgI

dlePercent

After the cluster gets

to the ACTIVE state.

The average percentage of the time

the request handler threads are idle.

RequestThrottleQue

ueSize

After request

throttling is applied.

The number of messages in the

throttle queue.

RequestThrottleTime

After request

throttling is applied.

The average request throttle time in

milliseconds.

TcpConnections

After the cluster gets

to the ACTIVE state.

Shows number of incoming and

outgoing TCP segments with the SYN

ﬂag set.

RemoteCopyLagBytes

(TotalTierBytesLag in

v2.8.2.tiered)

After you create a

topic.

The total number of bytes of the

data that is eligible for tiering on the

broker but has not been transferred

to tiered storage yet. This metrics

show the eﬃciency of upstream data

transfer. As the lag increases, the

amount of data that doesn't persist

in tiered storage increases. Category:

Archive lag. This is a not a KIP-405

metric.

TrafficBytes

After the cluster gets

to the ACTIVE state.

Shows network traﬃc in overall

bytes between clients (producers

and consumers) and brokers. Traﬃc

between brokers isn't reported.

PER_BROKER Level monitoring

343

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Description

VolumeQueueLength

After the cluster gets

to the ACTIVE state.

The number of read and write

operation requests waiting to be

completed in a speciﬁed time period.

VolumeReadBytes

After the cluster gets

to the ACTIVE state.

The number of bytes read in a

speciﬁed time period.

VolumeReadOps

After the cluster gets

to the ACTIVE state.

The number of read operations in a

speciﬁed time period.

VolumeTotalReadTime

After the cluster gets

to the ACTIVE state.

The total number of seconds spent by

all read operations that completed in

a speciﬁed time period.

VolumeTotalWriteTime

After the cluster gets

to the ACTIVE state.

The total number of seconds spent by

all write operations that completed in

a speciﬁed time period.

VolumeWriteBytes

After the cluster gets

to the ACTIVE state.

The number of bytes written in a

speciﬁed time period.

VolumeWriteOps

After the cluster gets

to the ACTIVE state.

The number of write operations in a

speciﬁed time period.

PER_TOPIC_PER_BROKER Level monitoring

When you set the monitoring level to PER_TOPIC_PER_BROKER, you get the metrics described in

the following table, in addition to all the metrics from the PER_BROKER and DEFAULT levels. Only

the DEFAULT level metrics are free. The metrics in this table have the following dimensions: Cluster

Name, Broker ID, Topic.

Important

For an Amazon MSK cluster that uses Apache Kafka 2.4.1 or a newer version, the metrics

in the following table appear only after their values become nonzero for the ﬁrst time.

PER_TOPIC_PER_BROKER Level monitoring

344

Amazon Managed Streaming for Apache Kafka Developer Guide

For example, to see BytesInPerSec, one or more producers must ﬁrst send data to the

cluster.

Additional metrics that are available starting at the PER_TOPIC_PER_BROKER monitoring level

Name When visible Description

FetchMessageConver

sionsPerSec

After you create

a topic.

The number of fetched messages converted

per second.

MessagesInPerSec

After you create

a topic.

The number of messages received per second.

ProduceMessageConv

ersionsPerSec

After you create

a topic.

The number of conversions per second for

produced messages.

RemoteFetchBytesPe

rSec (RemoteBy

tesInPerSec in

v2.8.2.tiered)

After you create

a topic and

the topic is

producing/

consuming.

The number of bytes transferred from tiered

storage in response to consumer fetches for

the speciﬁed topic and broker. This metric

includes all partitions from the topic that

contribute to downstream data transfer traﬃc

on the speciﬁed broker. Category: traﬃc and

error rates. This is a KIP-405 metric.

RemoteCopyBytesPer

Sec (RemoteBy

tesOutPerSec in

v2.8.2.tiered)

After you create

a topic and

the topic is

producing/

consuming.

The number of bytes transferred to tiered

storage, for the speciﬁed topic and broker.

This metric includes all partitions from the

topic that contribute to upstream data transfer

traﬃc on the speciﬁed broker. Category: traﬃc

and error rates. This is a KIP-405 metric.

RemoteFetchErrorsP

erSec (RemoteRe

adErrorPerSec in

v2.8.2.tiered)

After you create

a topic and

the topic is

producing/

consuming.

The rate of errors in response to read requests

that the speciﬁed broker sends to tiered

storage to retrieve data in response to

consumer fetches on the speciﬁed topic. This

metric includes all partitions from the topic

that contribute to downstream data transfer

PER_TOPIC_PER_BROKER Level monitoring

345

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Description

traﬃc on the speciﬁed broker. Category: traﬃc

and error rates. This is a KIP-405 metric.

RemoteFetchRequest

sPerSec (RemoteRe

adRequestsPerSec

in v2.8.2.tiered)

After you create

a topic and

the topic is

producing/

consuming.

The number of read requests that the speciﬁes

broker sends to tiered storage to retrieve

data in response to consumer fetches on

the speciﬁed topic. This metric includes all

partitions from the topic that contribute

to downstream data transfer traﬃc on the

speciﬁed broker. Category: traﬃc and error

rates. This is a KIP-405 metric.

RemoteCopyErrorsPe

rSec (RemoteWr

iteErrorPerSec in

v2.8.2.tiered)

After you create

a topic and

the topic is

producing/

consuming.

The rate of errors in response to write requests

that the speciﬁed broker sends to tiered

storage to transfer data upstream. This metric

includes all partitions from the topic that

contribute to upstream data transfer traﬃc on

the speciﬁed broker. Category: traﬃc and error

rates. This is a KIP-405 metric.

PER_TOPIC_PER_PARTITION Level monitoring

When you set the monitoring level to PER_TOPIC_PER_PARTITION, you get the metrics

described in the following table, in addition to all the metrics from the PER_TOPIC_PER_BROKER,

PER_BROKER, and DEFAULT levels. Only the DEFAULT level metrics are free. The metrics in this

table have the following dimensions: Consumer Group, Topic, Partition.

Additional metrics that are available starting at the PER_TOPIC_PER_PARTITION monitoring

level

Name When visible Description

EstimatedTimeLag

After consumer

group

consumes from

a topic.

Time estimate (in seconds) to drain the

partition oﬀset lag.

PER_TOPIC_PER_PARTITION Level monitoring

346

Amazon Managed Streaming for Apache Kafka Developer Guide

Name When visible Description

OffsetLag

After consumer

group

consumes from

a topic.

Partition-level consumer lag in number of

oﬀsets.

View Amazon MSK metrics using CloudWatch

You can monitor metrics for Amazon MSK using the CloudWatch console, the command line, or the

CloudWatch API. The following procedures show you how to access metrics using these diﬀerent

methods.

To access metrics using the CloudWatch console

console.aws.amazon.com/cloudwatch/.

1. In the navigation pane, choose Metrics.

2. Choose the All metrics tab, and then choose AWS/Kafka.

3. To view topic-level metrics, choose Topic, Broker ID, Cluster Name; for broker-level metrics,

choose Broker ID, Cluster Name; and for cluster-level metrics, choose Cluster Name.

4. (Optional) In the graph pane, select a statistic and a time period, and then create a

CloudWatch alarm using these settings.

To access metrics using the AWS CLI

Use the list-metrics and get-metric-statistics commands.

To access metrics using the CloudWatch CLI

Use the mon-list-metrics and mon-get-stats commands.

To access metrics using the CloudWatch API

Use the ListMetrics and GetMetricStatistics operations.

View metrics using CloudWatch 347

Amazon Managed Streaming for Apache Kafka Developer Guide

Monitor consumer lags

Monitoring consumer lag allows you to identify slow or stuck consumers that aren't keeping up

with the latest data available in a topic. When necessary, you can then take remedial actions,

such as scaling or rebooting those consumers. To monitor consumer lag, you can use Amazon

CloudWatch or open monitoring with Prometheus.

Consumer lag metrics quantify the diﬀerence between the latest data written to your topics

and the data read by your applications. Amazon MSK provides the following consumer-lag

metrics, which you can get through Amazon CloudWatch or through open monitoring with

Prometheus: EstimatedMaxTimeLag, EstimatedTimeLag, MaxOffsetLag, OffsetLag,

and SumOffsetLag. For information about these metrics, see the section called “Metrics for

monitoring with CloudWatch”.

Note

Consumer-lag metrics are visible only for consumer groups in a STABLE state. A consumer

group is STABLE after the successful completion of re-balancing, ensuring that partitions

are evenly distributed among the consumers.

Amazon MSK supports consumer lag metrics for clusters with Apache Kafka 2.2.1 or a later version.

Monitor MSK cluster with Prometheus

You can monitor your MSK cluster with Prometheus, an open-source monitoring system for time-

series metric data. You can publish this data to Amazon Managed Service for Prometheus using

Prometheus's remote write feature. You can also use tools that are compatible with Prometheus-

formatted metrics or tools that integrate with Amazon MSK Open Monitoring, like Datadog,

Lenses, New Relic, and Sumo logic. Open monitoring is available for free but charges apply for the

transfer of data across Availability Zones. For information about Prometheus, see the Prometheus

documentation.

Monitor consumer lags 348

Amazon Managed Streaming for Apache Kafka Developer Guide

Enable open monitoring on new MSK cluster

Using the AWS Management Console

1. Sign in to the AWS Management Console, and open the Amazon MSK console at https://

console.aws.amazon.com/msk/home?region=us-east-1#/home/.

2. In the Monitoring section, select the check box next to Enable open monitoring with

Prometheus.

3. Provide the required information in all the sections of the page, and review all the available

options.

4. Choose Create cluster.

Using the AWS CLI

•

Invoke the create-cluster command and specify its open-monitoring option. Enable the

JmxExporter, the NodeExporter, or both. If you specify open-monitoring, the two

exporters can't be disabled at the same time.

Using the API

•

Invoke the CreateCluster operation and specify OpenMonitoring. Enable the jmxExporter,

the nodeExporter, or both. If you specify OpenMonitoring, the two exporters can't be

disabled at the same time.

Enable open monitoring on existing Amazon MSK cluster

To enable open monitoring, make sure that the cluster is in the ACTIVE state.

Using the AWS Management Console

1. Sign in to the AWS Management Console, and open the Amazon MSK console at https://

console.aws.amazon.com/msk/home?region=us-east-1#/home/.

2. Choose the name of the cluster that you want to update. This takes you to a page the contains

details for the cluster.

3. On the Properties tab, scroll down to ﬁnd the Monitoring section.

4. Choose Edit.

Enable open monitoring on new clusters 349

Amazon Managed Streaming for Apache Kafka Developer Guide

5. Select the check box next to Enable open monitoring with Prometheus.

6. Choose Save changes.

Using the AWS CLI

•

Invoke the update-monitoring command and specify its open-monitoring option. Enable

the JmxExporter, the NodeExporter, or both. If you specify open-monitoring, the two

exporters can't be disabled at the same time.

Using the API

•

Invoke the UpdateMonitoring operation and specify OpenMonitoring. Enable the

jmxExporter, the nodeExporter, or both. If you specify OpenMonitoring, the two

exporters can't be disabled at the same time.

Set up a Prometheus host on an Amazon EC2 instance

1. Download the Prometheus server from https://prometheus.io/download/#prometheus to your

Amazon EC2 instance.

2. Extract the downloaded ﬁle to a directory and go to that directory.

Create a ﬁle with the following contents and name it prometheus.yml.

# file: prometheus.yml

# my global config

global:

scrape_interval: 60s

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# The job name is added as a label `job=<job_name>` to any timeseries scraped

from this config.

- job_name: 'prometheus'

static_configs:

# 9090 is the prometheus server port

- targets: ['localhost:9090']

- job_name: 'broker'

file_sd_configs:

Set up a Prometheus host 350

Amazon Managed Streaming for Apache Kafka Developer Guide

- files:

- 'targets.json'

4. Use the ListNodes operation to get a list of your cluster's brokers.

Create a ﬁle named targets.json with the following JSON. Replace broker_dns_1,

broker_dns_2, and the rest of the broker DNS names with the DNS names you obtained for

your brokers in the previous step. Include all of the brokers you obtained in the previous step.

Amazon MSK uses port 11001 for the JMX Exporter and port 11002 for the Node Exporter.

ZooKeeper mode targets.json

[

{

"labels": {

"job": "jmx"

"targets": [

"broker_dns_1:11001",

"broker_dns_2:11001",

"broker_dns_N:11001"

]

{

"labels": {

"job": "node"

"targets": [

"broker_dns_1:11002",

"broker_dns_2:11002",

"broker_dns_N:11002"

]

}

]

Set up a Prometheus host 351

Amazon Managed Streaming for Apache Kafka Developer Guide

KRaft mode targets.json

[

{

"labels": {

"job": "jmx"

"targets": [

"broker_dns_1:11001",

"broker_dns_2:11001",

"broker_dns_N:11001",

"controller_dns_1:11001",

"controller_dns_2:11001",

"controller_dns_3:11001"

]

{

"labels": {

"job": "node"

"targets": [

"broker_dns_1:11002",

"broker_dns_2:11002",

"broker_dns_N:11002"

]

}

]

Note

To scrape JMX metrics from KRaft controllers, add controller DNS names as

targets in the JSON ﬁle. For example: controller_dns_1:11001, replacing

controller_dns_1 with the actual controller DNS name.

Set up a Prometheus host 352

Amazon Managed Streaming for Apache Kafka Developer Guide

6. To start the Prometheus server on your Amazon EC2 instance, run the following command

in the directory where you extracted the Prometheus ﬁles and saved prometheus.yml and

targets.json.

./prometheus

7. Find the IPv4 public IP address of the Amazon EC2 instance where you ran Prometheus in the

previous step. You need this public IP address in the following step.

8. To access the Prometheus web UI, open a browser that can access your Amazon EC2 instance,

and go to Prometheus-Instance-Public-IP:9090, where Prometheus-Instance-

Public-IP is the public IP address you got in the previous step.

Use Prometheus metrics

All metrics emitted by Apache Kafka to JMX are accessible using open monitoring with

Prometheus. For information about Apache Kafka metrics, see Monitoring in the Apache Kafka

documentation. Along with Apache Kafka metrics, consumer-lag metrics are also available at port

11001 under the JMX MBean name kafka.consumer.group:type=ConsumerLagMetrics. You

can also use the Prometheus Node Exporter to get CPU and disk metrics for your brokers at port

11002.

Store Prometheus metrics in Amazon Managed Service for Prometheus

Amazon Managed Service for Prometheus is a Prometheus-compatible monitoring and alerting

service that you can use to monitor Amazon MSK clusters. It is a fully-managed service that

automatically scales the ingestion, storage, querying, and alerting of your metrics. It also

integrates with AWS security services to give you fast and secure access to your data. You can use

the open-source PromQL query language to query your metrics and alert on them.

For more information, see Getting started with Amazon Managed Service for Prometheus.

Use Amazon MSK storage capacity alerts

On Amazon MSK provisioned clusters, you choose the cluster's primary storage capacity. If you

exhaust the storage capacity on a broker in your provisioned cluster, it can aﬀect its ability to

produce and consume data, leading to costly downtime. Amazon MSK oﬀers CloudWatch metrics

to help you monitor your cluster's storage capacity. However, to make it easier for you to detect

Use Prometheus metrics 353

Amazon Managed Streaming for Apache Kafka Developer Guide

and resolve storage capacity issues, Amazon MSK automatically sends you dynamic cluster storage

capacity alerts. The storage capacity alerts include recommendations for short-term and long-term

steps to manage your cluster's storage capacity. From the Amazon MSK console, you can use quick

links within the alerts to take recommended actions immediately.

There are two types of MSK storage capacity alerts: proactive and remedial.

• Proactive ("Action required") storage capacity alerts warn you about potential storage issues

with your cluster. When a broker in an MSK cluster has used over 60% or 80% of its disk storage

capacity, you'll receive proactive alerts for the aﬀected broker.

• Remedial ("Critical action required") storage capacity alerts require you to take remedial action to

ﬁx a critical cluster issue when one of the brokers in your MSK cluster has run out of disk storage

capacity.

Amazon MSK automatically sends these alerts to the Amazon MSK console, AWS Health

Dashboard, Amazon EventBridge, and email contacts for your AWS account. You can also conﬁgure

Amazon EventBridge to deliver these alerts to Slack or to tools such as New Relic, and Datadog.

Storage capacity alerts are enabled by default for all MSK provisioned clusters and can't be turned

oﬀ. This feature is supported in all regions where MSK is available.

Monitor storage capacity alerts

You can check for storage capacity alerts in several ways:

• Go to the Amazon MSK console. Storage capacity alerts are displayed in the cluster alerts pane

for 90 days. The alerts contain recommendations and single-click link actions to address disk

storage capacity issues.

• Use ListClusters, ListClustersV2, DescribeCluster, or DescribeClusterV2 APIs to view

CustomerActionStatus and all the alerts for a cluster.

• Go to the AWS Health Dashboard to view alerts from MSK and other AWS services.

• Set up AWS Health API and Amazon EventBridge to route alert notiﬁcations to 3rd party

platforms such as Datadog, NewRelic, and Slack.

Monitor storage capacity alerts 354

Amazon Managed Streaming for Apache Kafka Developer Guide

Use LinkedIn's Cruise Control for Apache Kafka with

Amazon MSK

You can use LinkedIn's Cruise Control to rebalance your Amazon MSK cluster, detect and ﬁx

anomalies, and monitor the state and health of the cluster.

To download and build Cruise Control

1. Create an Amazon EC2 instance in the same Amazon VPC as the Amazon MSK cluster.

2. Install Prometheus on the Amazon EC2 instance that you created in the previous step. Note

the private IP and the port. The default port number is 9090. For information on how to

conﬁgure Prometheus to aggregate metrics for your cluster, see the section called “Monitor

with Prometheus”.

3. Download Cruise Control on the Amazon EC2 instance. (Alternatively, you can use a separate

Amazon EC2 instance for Cruise Control if you prefer.) For a cluster that has Apache Kafka

version 2.4.*, use the latest 2.4.* Cruise Control release. If your cluster has an Apache Kafka

version that is older than 2.4.*, use the latest 2.0.* Cruise Control release.

4. Decompress the Cruise Control ﬁle, then go to the decompressed folder.

5. Run the following command to install git.

sudo yum -y install git

Run the following command to initialize the local repo. Replace Your-Cruise-Control-

Folder with the name of your current folder (the folder that you obtained when you

decompressed the Cruise Control download).

git init && git add . && git commit -m "Init local repo." && git tag -a Your-

Cruise-Control-Folder -m "Init local version."

7. Run the following command to build the source code.

./gradlew jar copyDependantLibs

355

Amazon Managed Streaming for Apache Kafka Developer Guide

To conﬁgure and run Cruise Control

Make the following updates to the config/cruisecontrol.properties ﬁle. Replace the

example bootstrap servers and bootstrap-brokers string with the values for your cluster. To get

these strings for your cluster, you can see the cluster details in the console. Alternatively, you

can use the GetBootstrapBrokers and DescribeCluster API operations or their CLI equivalents.

# If using TLS encryption, use 9094; use 9092 if using plaintext

bootstrap.servers=b-1.test-cluster.2skv42.c1.kafka.us-

east-1.amazonaws.com:9094,b-2.test-cluster.2skv42.c1.kafka.us-

east-1.amazonaws.com:9094,b-3.test-cluster.2skv42.c1.kafka.us-

east-1.amazonaws.com:9094

# SSL properties, needed if cluster is using TLS encryption

security.protocol=SSL

ssl.truststore.location=/home/ec2-user/kafka.client.truststore.jks

# Use the Prometheus Metric Sampler

metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.prometheus.PrometheusMetricSampler

# Prometheus Metric Sampler specific configuration

prometheus.server.endpoint=1.2.3.4:9090 # Replace with your Prometheus IP and port

# Change the capacity config file and specify its path; details below

capacity.config.file=config/capacityCores.json

Edit the config/capacityCores.json ﬁle to specify the right disk size and CPU cores and

network in/out limits. You can use the DescribeCluster API operation (or its CLI equivalent) to

obtain the disk size. For CPU cores and network in/out limits, see Amazon EC2 Instance Types.

{

"brokerCapacities": [

{

"brokerId": "-1",

"capacity": {

"DISK": "10000",

"CPU": {

"num.cores": "2"

"NW_IN": "5000000",

"NW_OUT": "5000000"

356

Amazon Managed Streaming for Apache Kafka Developer Guide

"doc": "This is the default capacity. Capacity unit used for disk is in MB,

cpu is in number of cores, network throughput is in KB."

}

]

}

3. You can optionally install the Cruise Control UI. To download it, go to Setting Up Cruise

Control Frontend.

Run the following command to start Cruise Control. Consider using a tool like screen or tmux

to keep a long-running session open.

<path-to-your-kafka-installation>/bin/kafka-cruise-control-start.sh config/

cruisecontrol.properties 9091

5. Use the Cruise Control APIs or the UI to make sure that Cruise Control has the cluster load

data and that it's making rebalancing suggestions. It might take several minutes to get a valid

window of metrics.

Use automated deployment template of Cruise Control for

Amazon MSK

You can also use this CloudFormation template, to easily deploy Cruise Control and Prometheus

to gain deeper insights into your Amazon MSK cluster's performance and optimize resource

utilization.

Key features:

• Automated provisioning of an Amazon EC2 instance with Cruise Control and Prometheus pre-

conﬁgured.

• Support for Amazon MSK provisioned cluster.

• Flexible authentication with PlainText and IAM.

• No Zookeeper dependency for Cruise Control.

• Easily customize Prometheus targets, Cruise Control capacity settings, and other conﬁgurations

by providing your own conﬁguration ﬁles stored in an Amazon S3 bucket.

Automated deployment template 357

Amazon Managed Streaming for Apache Kafka Developer Guide

Amazon MSK quota

Your AWS account has default quotas for Amazon MSK. Unless otherwise stated, each per-account

quota is region-speciﬁc within your AWS account.

Amazon MSK quota

• Up to 90 brokers per account. 30 brokers per ZooKeeper mode cluster. 60 brokers per KRaft

mode cluster. To request higher quota, go to the Service Quotas console.

• A minimum of 1 GiB of storage per broker.

• A maximum of 16384 GiB of storage per broker.

• A cluster that uses the section called “IAM access control” can have up to

3000 TCP connections per broker at any given time. To increase this limit, you

can adjust the listener.name.client_iam.max.connections or the

listener.name.client_iam_public.max.connections conﬁguration property using the

Kafka AlterConﬁg API or the kafka-configs.sh tool. It's important to note that increasing

either property to a high value can result in unavailability.

• Limits on TCP connections. With connection rate bursts enabled, MSK allows 100 connections

per second. The exception is the kafka.t3.small instance type, which is allowed 4 connections

per second with connection rate bursts enabled. Older clusters that don't have connection rate

bursts enabled will have the feature automatically enabled when the cluster is patched.

To handle retries on failed connections, you can set the reconnect.backoff.ms conﬁguration

parameter on the client side. For example, if you want a client to retry connections after 1

second, set reconnect.backoff.ms to 1000. For more information, see reconnect.backoﬀ.ms

in the Apache Kafka documentation.

• Up to 100 conﬁgurations per account. To request a limit increase through Service Quotas, go to

the Service Quotas console.

• A maximum of 50 revisions per conﬁguration.

• To update the conﬁguration or the Apache Kafka version of an MSK cluster, ﬁrst ensure the

number of partitions per broker is under the limits described in the section called “ Right-size

your cluster: Number of partitions per broker”.

Amazon MSK quota 358

Amazon Managed Streaming for Apache Kafka Developer Guide

MSK Replicator quotas

• A maximum of 15 MSK Replicators per account.

• MSK Replicator only replicates up to 750 topics in sorted order. If you need to replicate more

topics, we recommend that you create a separate Replicator. Go to the Service Quotas console, if

you need support for more than 750 topics per Replicator. You can monitor the number of topics

being replicated using the "TopicCount" metric.

• A maximum ingress throughput of 1GB per second per MSK Replicator. Request a higher quota

by going through the Service Quotas console.

• MSK Replicator Record Size - A maximum of 10MB record size (message.max.bytes). Request a

higher quota by going through the Service Quotas console.

MSK Serverless quota

Note

If you experience any issue with service quota limits, create a support case with your use

case and requested limit.

Limits are per cluster, unless otherwise stated.

Dimension Quota Quota violation result

Maximum ingress throughput 200 MBps Slowdown with throttle

duration in response

Maximum egress throughput 400 MBps Slowdown with throttle

duration in response

Maximum retention duration Unlimited N/A

Maximum number of client

connections

3000 Connection close

Maximum connection

attempts

100 per second Connection close

MSK Replicator quotas 359

Amazon Managed Streaming for Apache Kafka Developer Guide

Dimension Quota Quota violation result

Maximum message size 8 MB Request fails with ErrorCode:

INVALID_REQUEST

Maximum request rate 15,000 per second Slowdown with throttle

duration in response

Maximum rate of topic

management APIs requests

rate

2 per second Slowdown with throttle

duration in response

Maximum fetch bytes per

request

55 MB Request fails with ErrorCode:

INVALID_REQUEST

Maximum number of

consumer groups

500 JoinGroup request fails

Maximum number of partition

s (leaders)

2400 for non-compacted

topics. 120 for compacted

topics. To request a service

quota adjustment, create a

support case with your use

case and requested limit.

Request fails with ErrorCode:

INVALID_REQUEST

Maximum rate of partition

creation and deletion

250 in 5 minutes Request fails with ErrorCode

: THROUGHPUT_QUOTA_E

XCEEDED

Maximum ingress throughput

per partition

5 MBps Slowdown with throttle

duration in response

Maximum egress throughput

per partition

10 MBps Slowdown with throttle

duration in response

Maximum partition size (for

compacted topics)

250 GB Request fails with ErrorCode

: THROUGHPUT_QUOTA_E

XCEEDED

Quota for serverless clusters 360

Amazon Managed Streaming for Apache Kafka Developer Guide

Dimension Quota Quota violation result

Maximum number of client

VPCs per serverless cluster

5 

Maximum number of serverles

s clusters per account

10. To request a service quota

adjustment, create a support

case with your use case and

requested limit.



MSK Connect quota

• Up to 100 custom plugins.

• Up to 100 worker conﬁgurations.

• Up to 60 connect workers. If a connector is set up to have auto scaled capacity, then the

maximum number of workers that the connector is set up to have is the number MSK Connect

uses to calculate the quota for the account.

• Up to 10 workers per connector.

To request higher quota for MSK Connect, go to the Service Quotas console.

MSK Connect quota 361

Amazon Managed Streaming for Apache Kafka Developer Guide

Amazon MSK resources

The term resources has two meanings in Amazon MSK, depending on the context. In the context

of APIs a resource is a structure on which you can invoke an operation. For a list of these resources

and the operations that you can invoke on them, see Resources in the Amazon MSK API Reference.

In the context of the section called “IAM access control”, a resource is an entity to which you can

allow or deny access, as deﬁned in the the section called “Authorization policy resources” section.

362

Amazon Managed Streaming for Apache Kafka Developer Guide

MSK integrations

This section provides references to AWS features that integrate with Amazon MSK.

Topics

• Amazon Athena connector for Amazon MSK

• Amazon Redshift streaming data ingestion for Amazon MSK

• Firehose integration for Amazon MSK

• Access Amazon EventBridge Pipes through the Amazon MSK console

Amazon Athena connector for Amazon MSK

The Amazon Athena connector for Amazon MSK enables Amazon Athena to run SQL queries on

Apache Kafka topics. Use this connector to view Apache Kafka topics as tables and messages as

rows in Athena.

For more information, see Amazon Athena MSK Connector in the Amazon Athena User Guide.

Amazon Redshift streaming data ingestion for Amazon MSK

Amazon Redshift supports streaming ingestion from Amazon MSK. The Amazon Redshift streaming

ingestion feature provides low-latency, high-speed ingestion of streaming data from Amazon MSK

into an Amazon Redshift materialized view. Because it doesn't need to stage data in Amazon S3,

Amazon Redshift can ingest streaming data at a lower latency and at a reduced storage cost. You

can conﬁgure Amazon Redshift streaming ingestion on an Amazon Redshift cluster using SQL

statements to authenticate and connect to an Amazon MSK topic.

For more information, see Streaming ingestion in the Amazon Redshift Database Developer Guide.

Firehose integration for Amazon MSK

Amazon MSK integrates with Firehose to provide a serverless, no-code solution to deliver

streams from Apache Kafka clusters to Amazon S3 data lakes. Firehose is a streaming extract,

transform, and load (ETL) service that reads data from your Amazon MSK Kafka topics, performs

transformations such as conversion to Parquet, and aggregates and writes the data to Amazon S3.

Athena connector for Amazon MSK 363

Amazon Managed Streaming for Apache Kafka Developer Guide

With few click from the console, you can setup a Firehose stream to read from a Kafka topic and

deliver to an S3 location. There is no code to write, no connector applications, and no resources to

provision. Firehose automatically scales based on the amount of data published to the Kafka topic,

and you only pay for the bytes ingested from Kafka.

See the following for more information about this feature.

• Writing to Kinesis Data Firehose Using Amazon MSK - Amazon Kinesis Data Firehose in the

Amazon Data Firehose Developer Guide

• Blog: Amazon MSK Introduces Managed Data Delivery from Apache Kafka to Your Data Lake

• Lab: Delivery to Amazon S3 using Firehose

Access Amazon EventBridge Pipes through the Amazon MSK

console

Amazon EventBridge Pipes connects sources to targets. Pipes are intended for point-to-point

integrations between supported sources and targets, with support for advanced transformations

and enrichment. EventBridge Pipes provide a highly scalable way to connect your Amazon MSK

cluster to AWS services such as Step Functions, Amazon SQS, and API Gateway, as well as third-

party software as a service (SaaS) applications like Salesforce.

To set up a pipe, you choose the source, add optional ﬁltering, deﬁne optional enrichment, and

choose the target for the event data.

On the details page for an Amazon MSK cluster, you can view the pipes that use that cluster as

their source. From there, you can also:

• Launch the EventBridge console to view pipe details.

• Launch the EventBridge console to create a new pipe with the cluster as its source.

For more information on conﬁguring an Amazon MSK cluster as a pipe source, see Amazon

Managed Streaming for Apache Kafka cluster as a source in the Amazon EventBridge User Guide. For

more information about EventBridge Pipes in general, see EventBridge Pipes.

To access EventBridge pipes for a given Amazon MSK cluster

1. Open the Amazon MSK console and choose Clusters .

Access EventBridge pipes 364

Amazon Managed Streaming for Apache Kafka Developer Guide

2. Select a cluster.

3. On the cluster detail page, choose the Integration tab.

The Integration tab includes a list of any pipes currently conﬁgured to use the selected cluster

as a source, including:

• pipe name

• current status

• pipe target

• when the pipe was last modiﬁed

4. Manage the pipes for your Amazon MSK cluster as desired:

To access more details about a pipe

• Choose the pipe.

This launches the Pipe details page of the EventBridge console.

To create a new pipe

• Choose Connect Amazon MSK cluster to pipe.

This launches the Create pipe page of the EventBridge console, with the Amazon MSK

cluster speciﬁed as the pipe source. For more information, see Creating an EventBridge pipe

in the Amazon EventBridge User Guide.

• You can also create a pipe for a cluster from the Clusters page. Select the cluster, and, from

the Actions menu, select Create EventBridge Pipe.

Access EventBridge pipes 365

Amazon Managed Streaming for Apache Kafka Developer Guide

Apache Kafka versions

When you create an Amazon MSK cluster, you specify which Apache Kafka version you want to

have on it. You can also update the Apache Kafka version of an existing cluster. The topics in

the chapter help you understand timelines for Kafka version support and suggestions for best

practices.

Topics

• Supported Apache Kafka versions

• Amazon MSK version support

Supported Apache Kafka versions

Amazon Managed Streaming for Apache Kafka (Amazon MSK) supports the following Apache Kafka

and Amazon MSK versions. The Apache Kafka community provides approximately 12 months of

support for a version after its release date. For more details check the Apache Kafka EOL (end of

life) policy.

Supported Apache Kafka versions

Apache Kafka version MSK release date End of support date

1.1.1 -- 2024-06-05

2.1.0 -- 2024-06-05

2.2.1 2019-07-31 2024-06-08

2.3.1 2019-12-19 2024-06-08

2.4.1 2020-04-02 2024-06-08

2.4.1.1 2020-09-09 2024-06-08

2.5.1 2020-09-30 2024-06-08

2.6.0 2020-10-21 2024-09-11

2.6.1 2021-01-19 2024-09-11

Supported Apache Kafka versions 366

Amazon Managed Streaming for Apache Kafka Developer Guide

Apache Kafka version MSK release date End of support date

2.6.2 2021-04-29 2024-09-11

2.6.3 2021-12-21 2024-09-11

2.7.0 2020-12-29 2024-09-11

2.7.1 2021-05-25 2024-09-11

2.7.2 2021-12-21 2024-09-11

2.8.0 -- 2024-09-11

2.8.1 2022-10-28 2024-09-11

2.8.2-tiered 2022-10-28 2025-01-14

3.1.1 2022-06-22 2024-09-11

3.2.0 2022-06-22 2024-09-11

3.3.1 2022-10-26 2024-09-11

3.3.2 2023-03-02 2024-09-11

3.4.0 2023-05-04 2025-06-17

3.5.1 (recommended) 2023-09-26 --

3.6.0 2023-11-16 --

3.7.x 2024-05-29 --

For more information on Amazon MSK version support policy, see Amazon MSK version support

policy.

Apache Kafka version 3.7.x (with production-ready tiered storage)

Apache Kafka version 3.7.x on MSK includes support for Apache Kafka version 3.7.0. You can create

clusters or upgrade existing clusters to use the new 3.7.x version. With this change in version

Apache Kafka version 3.7.x (with production-ready tiered storage) 367

Amazon Managed Streaming for Apache Kafka Developer Guide

naming, you no longer have to adopt newer patch ﬁx versions such as 3.7.1 when they are released

by the Apache Kafka community. Amazon MSK will automatically update 3.7.x to support future

patch versions once they become available. This allows you to beneﬁt from the security and bug

ﬁxes available through patch ﬁx versions without triggering a version upgrade. These patch ﬁx

versions released by Apache Kafka don't break version compatibility and you can beneﬁt from the

new patch ﬁx versions without worrying about read or write errors for your client applications.

Please make sure your infrastructure automation tools, such as CloudFormation, are updated to

account for this change in version naming.

Amazon MSK now supports KRaft mode (Apache Kafka Raft) in Apache Kafka version 3.7.x. On

Amazon MSK, like with ZooKeeper nodes, KRaft controllers are included at no additional cost to

you, and require no additional setup or management from you. You can now create clusters in

either KRaft mode or ZooKeeper mode on Apache Kafka version 3.7.x. In Kraft mode, you can add

up to 60 brokers to host more partitions per-cluster, without requesting a limit increase, compared

to the 30-broker quota on Zookeeper-based clusters. To learn more about KRaft on MSK, see KRaft

mode.

Apache Kafka version 3.7.x also includes several bug ﬁxes and new features that improve

performance. Key improvements include leader discovery optimizations for clients and log

segment ﬂush optimization options. For a complete list of improvements and bug ﬁxes, see the

Apache Kafka release notes for 3.7.0.

Apache Kafka version 3.6.0 (with production-ready tiered storage)

For information about Apache Kafka version 3.6.0 (with production-ready tiered storage), see its

release notes on the Apache Kafka downloads site.

Amazon MSK will continue to use and manage Zookeeper for quorum management in this release

for stability.

Amazon MSK version 3.5.1

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version

3.5.1 for new and existing clusters. Apache Kafka 3.5.1 includes several bug ﬁxes and new features

that improve performance. Key features include the introduction of new rack-aware partition

assignment for consumers. Amazon MSK will continue to use and manage Zookeeper for quorum

management in this release. For a complete list of improvements and bug ﬁxes, see the Apache

Kafka release notes for 3.5.1.

Apache Kafka version 3.6.0 (with production-ready tiered storage) 368

Amazon Managed Streaming for Apache Kafka Developer Guide

For information about Apache Kafka version 3.5.1, see its release notes on the Apache Kafka

downloads site.

Amazon MSK version 3.4.0

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version

3.4.0 for new and existing clusters. Apache Kafka 3.4.0 includes several bug ﬁxes and new features

that improve performance. Key features include a ﬁx to improve stability to fetch from the closest

replica. Amazon MSK will continue to use and manage Zookeeper for quorum management in this

release. For a complete list of improvements and bug ﬁxes, see the Apache Kafka release notes for

3.4.0.

For information about Apache Kafka version 3.4.0, see its release notes on the Apache Kafka

downloads site.

Amazon MSK version 3.3.2

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version

3.3.2 for new and existing clusters. Apache Kafka 3.3.2 includes several bug ﬁxes and new features

that improve performance. Key features include a ﬁx to improve stability to fetch from the closest

replica. Amazon MSK will continue to use and manage Zookeeper for quorum management in this

release. For a complete list of improvements and bug ﬁxes, see the Apache Kafka release notes for

3.3.2.

For information about Apache Kafka version 3.3.2, see its release notes on the Apache Kafka

downloads site.

Amazon MSK version 3.3.1

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version

3.3.1 for new and existing clusters. Apache Kafka 3.3.1 includes several bug ﬁxes and new features

that improve performance. Some of the key features include enhancements to metrics and

partitioner. Amazon MSK will continue to use and manage Zookeeper for quorum management in

this release for stability. For a complete list of improvements and bug ﬁxes, see the Apache Kafka

release notes for 3.3.1.

For information about Apache Kafka version 3.3.1, see its release notes on the Apache Kafka

downloads site.

Amazon MSK version 3.4.0 369

Amazon Managed Streaming for Apache Kafka Developer Guide

Amazon MSK version 3.1.1

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version

3.1.1 and 3.2.0 for new and existing clusters. Apache Kafka 3.1.1 and Apache Kafka 3.2.0 includes

several bug ﬁxes and new features that improve performance. Some of the key features include

enhancements to metrics and the use of topic IDs. MSK will continue to use and manage Zookeeper

for quorum management in this release for stability. For a complete list of improvements and bug

ﬁxes, see the Apache Kafka release notes for 3.1.1 and 3.2.0.

For information about Apache Kafka version 3.1.1 and 3.2.0, see its 3.2.0 release notes and 3.1.1

release notes on the Apache Kafka downloads site.

Amazon MSK tiered storage version 2.8.2.tiered

This release is an Amazon MSK-only version of Apache Kafka version 2.8.2, and is compatible with

open source Apache Kafka clients.

The 2.8.2.tiered release contains tiered storage functionality that is compatible with APIs

introduced in KIP-405 for Apache Kafka. For more information about the Amazon MSK tiered

storage feature, see Tiered storage for Amazon MSK clusters.

Apache Kafka version 2.5.1

Apache Kafka version 2.5.1 includes several bug ﬁxes and new features, including encryption in-

transit for Apache ZooKeeper and administration clients. Amazon MSK provides TLS ZooKeeper

endpoints, which you can query with the DescribeCluster operation.

The output of the DescribeCluster operation includes the ZookeeperConnectStringTls node,

which lists the TLS zookeeper endpoints.

The following example shows the ZookeeperConnectStringTls node of the response for the

DescribeCluster operation:

"ZookeeperConnectStringTls": "z-3.awskafkatutorialc.abcd123.c3.kafka.us-

east-1.amazonaws.com:2182,z-2.awskafkatutorialc.abcd123.c3.kafka.us-

east-1.amazonaws.com:2182,z-1.awskafkatutorialc.abcd123.c3.kafka.us-

east-1.amazonaws.com:2182"

For information about using TLS encryption with zookeeper, see Using TLS security with Apache

ZooKeeper.

Amazon MSK version 3.1.1 370

Amazon Managed Streaming for Apache Kafka Developer Guide

For more information about Apache Kafka version 2.5.1, see its release notes on the Apache Kafka

downloads site.

Amazon MSK bug-ﬁx version 2.4.1.1

This release is an Amazon MSK-only bug-ﬁx version of Apache Kafka version 2.4.1. This bug-ﬁx

release contains a ﬁx for KAFKA-9752, a rare issue that causes consumer groups to continuously

rebalance and remain in the PreparingRebalance state. This issue aﬀects clusters running

Apache Kafka versions 2.3.1 and 2.4.1. This release contains a community-produced ﬁx that is

available in Apache Kafka version 2.5.0.

Note

Amazon MSK clusters running version 2.4.1.1 are compatible with any Apache Kafka client

that is compatible with Apache Kafka version 2.4.1.

We recommend that you use MSK bug-ﬁx version 2.4.1.1 for new Amazon MSK clusters if you

prefer to use Apache Kafka 2.4.1. You can update existing clusters running Apache Kafka version

2.4.1 to this version to incorporate this ﬁx. For information about upgrading an existing cluster, see

Update the Apache Kafka version.

To work around this issue without upgrading the cluster to version 2.4.1.1, see the Consumer group

stuck in PreparingRebalance state section of the Troubleshoot your Amazon MSK cluster guide.

Apache Kafka version 2.4.1 (use 2.4.1.1 instead)

Note

You can no longer create an MSK cluster with Apache Kafka version 2.4.1. Instead, you

can use Amazon MSK bug-ﬁx version 2.4.1.1 with clients compatible with Apache Kafka

version 2.4.1. And if you already have an MSK cluster with Apache Kafka version 2.4.1, we

recommend you update it to use Apache Kafka version 2.4.1.1 instead.

KIP-392 is one of the key Kafka Improvement Proposals that are included in the 2.4.1 release of

Apache Kafka. This improvement allows consumers to fetch from the closest replica. To use this

feature, set client.rack in the consumer properties to the ID of the consumer's Availability

Zone. An example AZ ID is use1-az1. Amazon MSK sets broker.rack to the IDs of the

Amazon MSK bug-ﬁx version 2.4.1.1 371

Amazon Managed Streaming for Apache Kafka Developer Guide

Availability Zones of the brokers. You must also set the replica.selector.class conﬁguration

property to org.apache.kafka.common.replica.RackAwareReplicaSelector, which is an

implementation of rack awareness provided by Apache Kafka.

When you use this version of Apache Kafka, the metrics in the PER_TOPIC_PER_BROKER

monitoring level appear only after their values become nonzero for the ﬁrst time. For more

information about this, see the section called “PER_TOPIC_PER_BROKER Level monitoring”.

For information about how to ﬁnd Availability Zone IDs, see AZ IDs for Your Resource in the AWS

Resource Access Manager user guide.

For information about setting conﬁguration properties, see Amazon MSK conﬁguration.

For more information about KIP-392, see Allow Consumers to Fetch from Closest Replica in the

Conﬂuence pages.

For more information about Apache Kafka version 2.4.1, see its release notes on the Apache Kafka

downloads site.

Amazon MSK version support

This topic describes the Amazon MSK version support policy and the procedure for Update the

Apache Kafka version. If you're upgrading your Kafka version, follow the best practices outlined in

Best practices for version upgrades.

Topics

• Amazon MSK version support policy

• Update the Apache Kafka version

• Best practices for version upgrades

Amazon MSK version support policy

This section describes the support policy for Amazon MSK supported Kafka versions.

• All Kafka versions are supported until they reach their end of support date. For details on

end of support dates, see Supported Apache Kafka versions. Upgrade your MSK cluster to the

recommended Kafka version or higher version before the end of support date. For details on

updating your Apache Kafka version, see Update the Apache Kafka version. A cluster using a

Kafka version after its end of support date is auto-upgraded to the recommended Kafka version.

Amazon MSK version support 372

Amazon Managed Streaming for Apache Kafka Developer Guide

Automatic updates can happen at any time after the end of support date. You will not receive

any notiﬁcation before the update.

• MSK will phase out support for newly created clusters that use Kafka versions with published end

of support dates.

Update the Apache Kafka version

You can update an existing MSK cluster to a newer version of Apache Kafka. You can't update

it to an older version. When you update the Apache Kafka version of an MSK cluster, also check

your client-side software to make sure its version enables you to use the features of the cluster's

new Apache Kafka version. Amazon MSK only updates the server software. It doesn't update your

clients.

For information about how to make a cluster highly available during an update, see the section

called “Build highly available clusters”.

Update the Apache Kafka version using the AWS Management Console

1. Open the Amazon MSK console at https://console.aws.amazon.com/msk/.

2. Choose the MSK cluster on which you want to update the Apache Kafka version.

3. On the Properties tab choose Upgrade in the Apache Kafka version section.

Update the Apache Kafka version using the AWS CLI

Run the following command, replacing ClusterArn with the Amazon Resource Name

(ARN) that you obtained when you created your cluster. If you don't have the ARN for your

cluster, you can ﬁnd it by listing all clusters. For more information, see the section called “List

clusters”.

aws kafka get-compatible-kafka-versions --cluster-arn ClusterArn

The output of this command includes a list of the Apache Kafka versions to which you can

update the cluster. It looks like the following example.

{

"CompatibleKafkaVersions": [

{

Update the Apache Kafka version 373

Amazon Managed Streaming for Apache Kafka Developer Guide

"SourceVersion": "2.2.1",

"TargetVersions": [

"2.3.1",

"2.4.1",

"2.4.1.1",

"2.5.1"

]

}

]

}

Run the following command, replacing ClusterArn with the Amazon Resource Name

(ARN) that you obtained when you created your cluster. If you don't have the ARN for your

cluster, you can ﬁnd it by listing all clusters. For more information, see the section called “List

clusters”.

Replace Current-Cluster-Version with the current version of the cluster. For

TargetVersion you can specify any of the target versions from the output of the previous

command.

Important

Cluster versions aren't simple integers. To ﬁnd the current version of the cluster, use

the DescribeCluster operation or the describe-cluster AWS CLI command. An example

version is KTVPDKIKX0DER.

aws kafka update-cluster-kafka-version --cluster-arn ClusterArn --current-

version Current-Cluster-Version --target-kafka-version TargetVersion

The output of the previous command looks like the following JSON.

{

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/exampleClusterName/

abcdefab-1234-abcd-5678-cdef0123ab01-2",

"ClusterOperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef"

}

Update the Apache Kafka version 374

Amazon Managed Streaming for Apache Kafka Developer Guide

To get the result of the update-cluster-kafka-version operation, run the following

command, replacing ClusterOperationArn with the ARN that you obtained in the output of

the update-cluster-kafka-version command.

aws kafka describe-cluster-operation --cluster-operation-arn ClusterOperationArn

The output of this describe-cluster-operation command looks like the following JSON

example.

{

"ClusterOperationInfo": {

"ClientRequestId": "62cd41d2-1206-4ebf-85a8-dbb2ba0fe259",

"ClusterArn": "arn:aws:kafka:us-east-1:012345678012:cluster/

exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2",

"CreationTime": "2021-03-11T20:34:59.648000+00:00",

"OperationArn": "arn:aws:kafka:us-east-1:012345678012:cluster-

operation/exampleClusterName/abcdefab-1234-abcd-5678-cdef0123ab01-2/0123abcd-

abcd-4f7f-1234-9876543210ef",

"OperationState": "UPDATE_IN_PROGRESS",

"OperationSteps": [

{

"StepInfo": {

"StepStatus": "IN_PROGRESS"

"StepName": "INITIALIZE_UPDATE"

{

"StepInfo": {

"StepStatus": "PENDING"

"StepName": "UPDATE_APACHE_KAFKA_BINARIES"

{

"StepInfo": {

"StepStatus": "PENDING"

"StepName": "FINALIZE_UPDATE"

}

"OperationType": "UPDATE_CLUSTER_KAFKA_VERSION",

"SourceClusterInfo": {

"KafkaVersion": "2.4.1"

Update the Apache Kafka version 375

Amazon Managed Streaming for Apache Kafka Developer Guide

"TargetClusterInfo": {

"KafkaVersion": "2.6.1"

}

If OperationState has the value UPDATE_IN_PROGRESS, wait a while, then run the

describe-cluster-operation command again. When the operation is complete, the value

of OperationState becomes UPDATE_COMPLETE. Because the time required for Amazon

MSK to complete the operation varies, you might need to check repeatedly until the operation

is complete.

Update the Apache Kafka version using the API

1. Invoke the GetCompatibleKafkaVersions operation to get a list of the Apache Kafka versions to

which you can update the cluster.

2. Invoke the UpdateClusterKafkaVersion operation to update the cluster to one of the

compatible Apache Kafka versions.

Best practices for version upgrades

To ensure client continuity during the rolling update that is performed as part of the Kafka version

upgrade process, review the conﬁguration of your clients and your Apache Kafka topics as follows:

•

Set the topic replication factor (RF) to a minimum value of 2 for two-AZ clusters and a minimum

value of 3 for three-AZ clusters. An RF value of 2 can lead to oﬄine partitions during patching.

•

Set minimum in-sync replicas (minISR) to a maximum value of RF - 1 to ensure the partition

replica set can tolerate one replica being oﬄine or under-replicated.

• Conﬁgure clients to use multiple broker connection strings. Having multiple brokers in a client’s

connection string allows for failover if a speciﬁc broker supporting client I/O begins to be

patched. For information about how to get a connection string with multiple brokers, see Getting

the bootstrap brokers for an Amazon MSK cluster.

• We recommend that you upgrade connecting clients to the recommended version or above to

beneﬁt from the features available in the new version. Client upgrades are not subject to the end

of life (EOL) dates of your MSK cluster's Kafka version, and do not need to be completed by the

Best practices for version upgrades 376

Amazon Managed Streaming for Apache Kafka Developer Guide

EOL date. Apache Kafka provides a bi-directional client compatibility policy that allows older

clients to work with newer clusters and vice versa.

•

Kafka clients using versions 3.x.x are likely to come with the following defaults: acks=all and

enable.idempotence=true. acks=all is diﬀerent from the previous default of acks=1

and provides extra durability by ensuring that all in-sync replicas acknowledge the produce

request. Similarly, the default for enable.idempotence was previously false. The change to

enable.idempotence=true as the default lowers the likelihood of duplicate messages. These

changes are considered best practice settings and may introduce a small amount of additional

latency that's within normal performance parameters.

• Use the recommended Kafka version when creating new MSK clusters. Using the recommended

Kafka version allows you to beneﬁt from the latest Kafka and MSK features.

Best practices for version upgrades 377

Amazon Managed Streaming for Apache Kafka Developer Guide

Troubleshoot your Amazon MSK cluster

The following information can help you troubleshoot problems that you might have with your

Amazon MSK cluster. You can also post your issue to AWS re:Post. For troubleshooting Amazon

MSK Replicator, see Troubleshoot MSK Replicator.

Topics

• Volume replacement causes disk saturation due to replication overload

• Consumer group stuck in PreparingRebalance state

• Error delivering broker logs to Amazon CloudWatch Logs

• No default security group

• Cluster appears stuck in the CREATING state

• Cluster state goes from CREATING to FAILED

• Cluster state is ACTIVE but producers cannot send data or consumers cannot receive data

• AWS CLI doesn't recognize Amazon MSK

• Partitions go oﬄine or replicas are out of sync

• Disk space is running low

• Memory running low

• Producer gets NotLeaderForPartitionException

• Under-replicated partitions (URP) greater than zero

• Cluster has topics called __amazon_msk_canary and __amazon_msk_canary_state

• Partition replication fails

• Unable to access cluster that has public access turned on

• Unable to access cluster from within AWS: Networking issues

• Failed authentication: Too many connects

• MSK Serverless: Cluster creation fails

378

Amazon Managed Streaming for Apache Kafka Developer Guide

Volume replacement causes disk saturation due to replication

overload

During unplanned volume hardware failure, Amazon MSK may replace the volume with a new

instance. Kafka repopulates the new volume by replicating partitions from other brokers in the

cluster. Once partitions are replicated and caught up, they are eligible for leadership and in-sync

replica (ISR) membership.

Problem

In a broker recovering from volume replacement, some partitions of varying sizes may come

back online before others. This can be problematic as those partitions can be serving traﬃc from

the same broker that is still catching up (replicating) other partitions. This replication traﬃc can

sometimes saturate the underlying volume throughput limits, which is 250 MiB per second in

the default case. When this saturation occurs, any partitions that are already caught up will be

impacted, resulting in latency across the cluster for any brokers sharing ISR with those caught up

partitions (not just leader partitions due to remote acks acks=all). This problem is more common

with larger clusters that have larger numbers of partitions that vary in size.

Recommendation

• To improve replication I/O posture, ensure that best practice thread settings are in place.

• To reduce the likelihood of underlying volume saturation, enable provisioned storage with a

higher throughput. A min throughput value of 500 MiB/s is recommended for high throughput

replication cases, but the actual value needed will vary with throughput and use case. Provision

storage throughput for brokers in a Amazon MSK cluster.

•

To minimize replication pressure, lower num.replica.fetchers to the default value of 2.

Consumer group stuck in PreparingRebalance state

If one or more of your consumer groups is stuck in a perpetual rebalancing state, the cause might

be Apache Kafka issue KAFKA-9752, which aﬀects Apache Kafka versions 2.3.1 and 2.4.1.

To resolve this issue, we recommend that you upgrade your cluster to Amazon MSK bug-ﬁx version

2.4.1.1, which contains a ﬁx for this issue. For information about updating an existing cluster to

Amazon MSK bug-ﬁx version 2.4.1.1, see Update the Apache Kafka version.

Volume replacement causes disk saturation due to replication overload 379

Amazon Managed Streaming for Apache Kafka Developer Guide

The workarounds for solving this issue without upgrading the cluster to Amazon MSK bug-ﬁx

version 2.4.1.1 are to either set the Kafka clients to use Static membership protocol , or to Identify

and reboot the coordinating broker node of the stuck consumer group.

Implementing static membership protocol

To implement Static Membership Protocol in your clients, do the following:

Set the group.instance.id property of your Kafka Consumers conﬁguration to a static

string that identiﬁes the consumer in the group.

2. Ensure that other instances of the conﬁguration are updated to use the static string.

3. Deploy the changes to your Kafka Consumers.

Using Static Membership Protocol is more eﬀective if the session timeout in the client

conﬁguration is set to a duration that allows the consumer to recover without prematurely

triggering a consumer group rebalance. For example, if your consumer application can tolerate 5

minutes of unavailability, a reasonable value for the session timeout would be 4 minutes instead of

the default value of 10 seconds.

Note

Using Static Membership Protocol only reduces the probability of encountering this issue.

You may still encounter this issue even when using Static Membership Protocol.

Rebooting the coordinating broker node

To reboot the coordinating broker node, do the following:

Identify the group coordinator using the kafka-consumer-groups.sh command.

2. Restart the group coordinator of the stuck consumer group using the RebootBroker API

action.

Error delivering broker logs to Amazon CloudWatch Logs

When you try to set up your cluster to send broker logs to Amazon CloudWatch Logs, you might

get one of two exceptions.

Static membership protocol 380

Amazon Managed Streaming for Apache Kafka Developer Guide

If you get an InvalidInput.LengthOfCloudWatchResourcePolicyLimitExceeded

exception, try again but use log groups that start with /aws/vendedlogs/. For more information,

see Enabling Logging from Certain Amazon Web Services.

If you get an InvalidInput.NumberOfCloudWatchResourcePoliciesLimitExceeded

exception, choose an existing Amazon CloudWatch Logs policy in your account, and append the

following JSON to it.

{"Sid":"AWSLogDeliveryWrite","Effect":"Allow","Principal":

{"Service":"delivery.logs.amazonaws.com"},"Action":

["logs:CreateLogStream","logs:PutLogEvents"],"Resource":["*"]}

If you try to append the JSON above to an existing policy but get an error that says you've reached

the maximum length for the policy you picked, try to append the JSON to another one of your

Amazon CloudWatch Logs policies. After you append the JSON to an existing policy, try once again

to set up broker-log delivery to Amazon CloudWatch Logs.

No default security group

If you try to create a cluster and get an error indicating that there's no default security group, it

might be because you are using a VPC that was shared with you. Ask your administrator to grant

you permission to describe the security groups on this VPC and try again. For an example of a

policy that allows this action, see Amazon EC2: Allows Managing EC2 Security Groups Associated

With a Speciﬁc VPC, Programmatically and in the Console .

Cluster appears stuck in the CREATING state

Sometimes cluster creation can take up to 30 minutes. Wait for 30 minutes and check the state of

the cluster again.

Cluster state goes from CREATING to FAILED

Try creating the cluster again.

No default security group 381

Amazon Managed Streaming for Apache Kafka Developer Guide

Cluster state is ACTIVE but producers cannot send data or

consumers cannot receive data

•

If the cluster creation succeeds (the cluster state is ACTIVE), but you can't send or receive data,

ensure that your producer and consumer applications have access to the cluster. For more

information, see the guidance in the section called “Create a client machine”.

• If your producers and consumers have access to the cluster but still experience problems

producing and consuming data, the cause might be KAFKA-7697, which aﬀects Apache Kafka

version 2.1.0 and can lead to a deadlock in one or more brokers. Consider migrating to Apache

Kafka 2.2.1, which is not aﬀected by this bug. For information about how to migrate, see Migrate

to Amazon MSK Cluster.

AWS CLI doesn't recognize Amazon MSK

If you have the AWS CLI installed, but it doesn't recognize the Amazon MSK commands, upgrade

your AWS CLI to the latest version. For detailed instructions on how to upgrade the AWS CLI, see

Installing the AWS Command Line Interface. For information about how to use the AWS CLI to run

Amazon MSK commands, see How it works.

Partitions go oﬄine or replicas are out of sync

These can be symptoms of low disk space. See the section called “Disk space is running low”.

Disk space is running low

See the following best practices for managing disk space: the section called “Monitor disk space”

and the section called “Adjust data retention parameters”.

Memory running low

If you see the MemoryUsed metric running high or MemoryFree running low, that doesn't mean

there's a problem. Apache Kafka is designed to use as much memory as possible, and it manages it

optimally.

Cluster state is ACTIVE but producers cannot send data or consumers cannot receive data 382

Amazon Managed Streaming for Apache Kafka Developer Guide

Producer gets NotLeaderForPartitionException

This is often a transient error. Set the producer's retries conﬁguration parameter to a value

that's higher than its current value.

Under-replicated partitions (URP) greater than zero

The UnderReplicatedPartitions metric is an important one to monitor. In a healthy MSK

cluster, this metric has the value 0. If it's greater than zero, that might be for one of the following

reasons.

•

If UnderReplicatedPartitions is spiky, the issue might be that the cluster isn't provisioned

at the right size to handle incoming and outgoing traﬃc. See Best practices.

•

If UnderReplicatedPartitions is consistently greater than 0 including during low-traﬃc

periods, the issue might be that you've set restrictive ACLs that don't grant topic access to

brokers. To replicate partitions, brokers must be authorized to both READ and DESCRIBE topics.

DESCRIBE is granted by default with the READ authorization. For information about setting ACLs,

see Authorization and ACLs in the Apache Kafka documentation.

Cluster has topics called __amazon_msk_canary and

__amazon_msk_canary_state

You might see that your MSK cluster has a topic with the name __amazon_msk_canary and

another with the name __amazon_msk_canary_state. These are internal topics that Amazon

MSK creates and uses for cluster health and diagnostic metrics. These topics are negligible in size

and can't be deleted.

Partition replication fails

Ensure that you haven't set ACLs on CLUSTER_ACTIONS.

Unable to access cluster that has public access turned on

If your cluster has public access turned on, but you still cannot access it from the internet, follow

these steps:

Producer gets NotLeaderForPartitionException 383

Amazon Managed Streaming for Apache Kafka Developer Guide

1. Ensure that the cluster's security group's inbound rules allow your IP address and the cluster's

port. For a list of cluster port numbers, see the section called “Port information”. Also

ensure that the security group's outbound rules allow outbound communications. For more

information about security groups and their inbound and outbound rules, see Security groups

for your VPC in the Amazon VPC User Guide.

2. Make sure that your IP address and the cluster's port are allowed in the inbound rules of the

cluster's VPC network ACL. Unlike security groups, network ACLs are stateless. This means that

you must conﬁgure both inbound and outbound rules. In the outbound rules, allow all traﬃc

(port range: 0-65535) to your IP address. For more information, see Add and delete rules in the

Amazon VPC User Guide.

3. Make sure that you are using the public-access bootstrap-brokers string to access the cluster.

An MSK cluster that has public access turned on has two diﬀerent bootstrap-brokers strings,

one for public access, and one for access from within AWS. For more information, see the

section called “Get the bootstrap brokers using the AWS Management Console”.

Unable to access cluster from within AWS: Networking issues

If you have an Apache Kafka application that is unable to communicate successfully with an MSK

cluster, start by performing the following connectivity test.

1. Use any of the methods described in the section called “Get the bootstrap brokers for an

Amazon MSK cluster” to get the addresses of the bootstrap brokers.

In the following command replace bootstrap-broker with one of the broker addresses that

you obtained in the previous step. Replace port-number with 9094 if the cluster is set up to

use TLS authentication. If the cluster doesn't use TLS authentication, replace port-number

with 9092. Run the command from the client machine.

telnet bootstrap-broker port-number

Where port-number is:

• 9094 if the cluster is set up to use TLS authentication.

• 9092 If the cluster doesn't use TLS authentication.

• A diﬀerent port-number is required if public access is enabled.

Unable to access cluster from within AWS: Networking issues 384

Amazon Managed Streaming for Apache Kafka Developer Guide

Run the command from the client machine.

3. Repeat the previous command for all the bootstrap brokers.

If the client machine is able to access the brokers, this means there are no connectivity issues. In

this case, run the following command to check whether your Apache Kafka client is set up correctly.

To get bootstrap-brokers, use any of the methods described in the section called “Get the

bootstrap brokers for an Amazon MSK cluster”. Replace topic with the name of your topic.

<path-to-your-kafka-installation>/bin/kafka-console-producer.sh --broker-

list bootstrap-brokers --producer.config client.properties --topic topic

If the previous command succeeds, this means that your client is set up correctly. If you're still

unable to produce and consume from an application, debug the problem at the application level.

If the client machine is unable to access the brokers, see the following subsections for guidance

that is based on your client-machine setup.

Amazon EC2 client and MSK cluster in the same VPC

If the client machine is in the same VPC as the MSK cluster, make sure the cluster's security group

has an inbound rule that accepts traﬃc from the client machine's security group. For information

about setting up these rules, see Security Group Rules. For an example of how to access a cluster

from an Amazon EC2 instance that's in the same VPC as the cluster, see Get started.

Amazon EC2 client and MSK cluster in diﬀerent VPCs

If the client machine and the cluster are in two diﬀerent VPCs, ensure the following:

• The two VPCs are peered.

• The status of the peering connection is active.

• The route tables of the two VPCs are set up correctly.

For information about VPC peering, see Working with VPC Peering Connections.

Amazon EC2 client and MSK cluster in the same VPC 385

Amazon Managed Streaming for Apache Kafka Developer Guide

On-premises client

In the case of an on-premises client that is set up to connect to the MSK cluster using AWS VPN,

ensure the following:

•

The VPN connection status is UP. For information about how to check the VPN connection status,

see How do I check the current status of my VPN tunnel?.

• The route table of the cluster's VPC contains the route for an on-premises CIDR whose target has

the format Virtual private gateway(vgw-xxxxxxxx).

• The MSK cluster's security group allows traﬃc on port 2181, port 9092 (if your cluster accepts

plaintext traﬃc), and port 9094 (if your cluster accepts TLS-encrypted traﬃc).

For more AWS VPN troubleshooting guidance, see Troubleshooting Client VPN.

AWS Direct Connect

If the client uses AWS Direct Connect, see Troubleshooting AWS Direct Connect.

If the previous troubleshooting guidance doesn't resolve the issue, ensure that no ﬁrewall is

blocking network traﬃc. For further debugging, use tools like tcpdump and Wireshark to analyze

traﬃc and to make sure that it is reaching the MSK cluster.

Failed authentication: Too many connects

The Failed authentication ... Too many connects error indicates that a broker is

protecting itself because one or more IAM clients are trying to connect to it at an aggressive

rate. To help brokers accept a higher rate of new IAM connections, you can increase the

reconnect.backoff.ms conﬁguration parameter.

To learn more about the rate limits for new connections per broker, see the Amazon MSK quota

page.

MSK Serverless: Cluster creation fails

If you try to create an MSK Serverless cluster and the workﬂow fails, you may not have permission

to create a VPC endpoint. Verify that your administrator has granted you permission to create a

VPC endpoint by allowing the ec2:CreateVpcEndpoint action.

On-premises client 386

Amazon Managed Streaming for Apache Kafka Developer Guide

For a complete list of permissions required to perform all Amazon MSK actions, see AWS managed

policy: AmazonMSKFullAccess.

MSK Serverless: Cluster creation fails 387

Amazon Managed Streaming for Apache Kafka Developer Guide

Best practices

This topic outlines some best practices to follow when using Amazon MSK. For information about

Amazon MSK Replicator best practices, see Best practices for using MSK Replicator.

Right-size your cluster: Number of partitions per broker

The following table shows the recommended number of partitions (including leader and follower

replicas) per broker.

Broker size Recommended number of partitions

(including leader and follower replicas) per

broker

kafka.t3.small

300

kafka.m5.large or kafka.m5.xlarge

1000

kafka.m5.2xlarge

2000

kafka.m5.4xlarge , kafka.m5.8xlarge ,

kafka.m5.12xlarge , kafka.m5.

16xlarge , or kafka.m5.24xlarge

4000

kafka.m7g.large or kafka.m7g

.xlarge

1000

kafka.m7g.2xlarge

2000

kafka.m7g.4xlarge , kafka.m7g

.8xlarge , kafka.m7g.12xlarge , or

kafka.m7g.16xlarge

4000

If the number of partitions per broker exceeds the recommended value and your cluster becomes

overloaded, you may be prevented from performing the following operations:

• Update the cluster conﬁguration

Right-size your cluster: Number of partitions per broker 388

Amazon Managed Streaming for Apache Kafka Developer Guide

• Update the cluster to a smaller broker size

• Associate an AWS Secrets Manager secret with a cluster that has SASL/SCRAM authentication

A high number of partitions can also result in missing Kafka metrics on CloudWatch and on

Prometheus scraping.

For guidance on choosing the number of partitions, see Apache Kafka Supports 200K Partitions

Per Cluster. We also recommend that you perform your own testing to determine the right size for

your brokers. For more information about the diﬀerent broker sizes, see the section called “Amazon

MSK broker sizes”.

Right-size your cluster: Number of brokers per cluster

To determine the right number of brokers for your MSK cluster and understand costs, see the MSK

Sizing and Pricing spreadsheet. This spreadsheet provides an estimate for sizing an MSK cluster

and the associated costs of Amazon MSK compared to a similar, self-managed, EC2-based Apache

Kafka cluster. For more information about the input parameters in the spreadsheet, hover over the

parameter descriptions. Estimates provided by this sheet are conservative and provide a starting

point for a new cluster. Cluster performance, size, and costs are dependent on your use case and

we recommend that you verify them with actual testing.

To understand how the underlying infrastructure aﬀects Apache Kafka performance, see Best

practices for right-sizing your Apache Kafka clusters to optimize performance and cost in the AWS

Big Data Blog. The blog post provides information about how to size your clusters to meet your

throughput, availability, and latency requirements. It also provides answers to questions such as

when you should scale up versus scale out, and guidance on how to continuously verify the size of

your production clusters.

Optimize cluster throughput for m5.4xl, m7g.4xl or larger

instances

When using m5.4xl, m7g.4xl, or larger instances, you can optimize the cluster throughput by tuning

the num.io.threads and num.network.threads conﬁgurations.

Num.io.threads is the number of threads that a broker uses for processing requests. Adding more

threads, up to the number of CPU cores supported for the instance size, can help improve cluster

throughput.

Right-size your cluster: Number of brokers per cluster 389

Amazon Managed Streaming for Apache Kafka Developer Guide

Num.network.threads is the number of threads the broker uses for receiving all incoming requests

and returning responses. Network threads place incoming requests on a request queue for

processing by io.threads. Setting num.network.threads to half the number of CPU cores supported

for the instance size allows for full usage of the new instance size.

Important

Do not increase num.network.threads without ﬁrst increasing num.io.threads as this can

lead to congestion related to queue saturation.

Recommended settings

Instance size Recommended value for

num.io.threads

Recommended value for

num.network.threads

m5.4xl 16 8

m5.8xl 32 16

m5.12xl 48 24

m5.16xl 64 32

m5.24xl 96 48

m7g.4xlarge 16 8

m7g.8xlarge 32 16

m7g.12xlarge 48 24

m7g.16xlarge 64 32

Use latest Kafka AdminClient to avoid topic ID mismatch issue

The ID of a topic is lost (Error: does not match the topic Id for partition) when you use a Kafka

AdminClient version lower than 2.8.0 with the ﬂag --zookeeper to increase or reassign topic

partitions for a cluster using Kafka version 2.8.0 or higher. Note that the --zookeeper ﬂag is

Use latest Kafka AdminClient to avoid topic ID mismatch issue 390

Amazon Managed Streaming for Apache Kafka Developer Guide

deprecated in Kafka 2.5 and is removed starting with Kafka 3.0. See Upgrading to 2.5.0 from any

version 0.8.x through 2.4.x.

To prevent topic ID mismatch, use a Kafka client version 2.8.0 or higher for Kafka admin

operations. Alternatively, clients 2.5 and higher can use the --bootstrap-servers ﬂag instead

of the --zookeeper ﬂag.

Build highly available clusters

Use the following recommendations so that your MSK cluster can be highly available during an

update (such as when you're updating the broker size or Apache Kafka version, for example) or

when Amazon MSK is replacing a broker.

• Set up a three-AZ cluster.

• Ensure that the replication factor (RF) is at least 3. Note that a RF of 1 can lead to oﬄine

partitions during a rolling update; and a RF of 2 may lead to data loss.

• Set minimum in-sync replicas (minISR) to at most RF - 1. A minISR that is equal to the RF

can prevent producing to the cluster during a rolling update. A minISR of 2 allows three-way

replicated topics to be available when one replica is oﬄine.

• Ensure client connection strings include at least one broker from each availability zone. Having

multiple brokers in a client's connection string allows for failover when a speciﬁc broker is oﬄine

for an update. For information about how to get a connection string with multiple brokers, see

the section called “Get the bootstrap brokers for an Amazon MSK cluster”.

Monitor CPU usage

Amazon MSK strongly recommends that you maintain the total CPU utilization for your brokers

(deﬁned as CPU User + CPU System) under 60%. When you have at least 40% of your cluster's

total CPU available, Apache Kafka can redistribute CPU load across brokers in the cluster when

necessary. One example of when this is necessary is when Amazon MSK detects and recovers

from a broker fault; in this case, Amazon MSK performs automatic maintenance, like patching.

Another example is when a user requests a broker-size change or version upgrade; in these two

cases, Amazon MSK deploys rolling workﬂows that take one broker oﬄine at a time. When brokers

with lead partitions go oﬄine, Apache Kafka reassigns partition leadership to redistribute work to

other brokers in the cluster. By following this best practice you can ensure you have enough CPU

headroom in your cluster to tolerate operational events like these.

Build highly available clusters 391

Amazon Managed Streaming for Apache Kafka Developer Guide

You can use Amazon CloudWatch metric math to create a composite metric that is CPU User +

CPU System. Set an alarm that gets triggered when the composite metric reaches an average

CPU utilization of 60%. When this alarm is triggered, scale the cluster using one of the following

options:

• Option 1 (recommended): Update your broker size to the next larger size. For example, if the

current size is kafka.m5.large, update the cluster to use kafka.m5.xlarge. Keep in mind

that when you update the broker size in the cluster, Amazon MSK takes brokers oﬄine in a

rolling fashion and temporarily reassigns partition leadership to other brokers. A size update

typically takes 10-15 minutes per broker.

• Option 2: If there are topics with all messages ingested from producers that use round-robin

writes (in other words, messages aren't keyed and ordering isn't important to consumers),

expand your cluster by adding brokers. Also add partitions to existing topics with the highest

throughput. Next, use kafka-topics.sh --describe to ensure that newly added partitions

are assigned to the new brokers. The main beneﬁt of this option compared to the previous one is

that you can manage resources and costs more granularly. Additionally, you can use this option

if CPU load signiﬁcantly exceeds 60% because this form of scaling doesn't typically result in

increased load on existing brokers.

• Option 3: Expand your cluster by adding brokers, then reassign existing partitions by using the

partition reassignment tool named kafka-reassign-partitions.sh. However, if you use

this option, the cluster will need to spend resources to replicate data from broker to broker

after partitions are reassigned. Compared to the two previous options, this can signiﬁcantly

increase the load on the cluster at ﬁrst. As a result, Amazon MSK doesn't recommend using this

option when CPU utilization is above 70% because replication causes additional CPU load and

network traﬃc. Amazon MSK only recommends using this option if the two previous options

aren't feasible.

Other recommendations:

• Monitor total CPU utilization per broker as a proxy for load distribution. If brokers have

consistently uneven CPU utilization it might be a sign that load isn't evenly distributed within the

cluster. Amazon MSK recommends using Cruise Control to continuously manage load distribution

via partition assignment.

• Monitor produce and consume latency. Produce and consume latency can increase linearly with

CPU utilization.

Monitor CPU usage 392

Amazon Managed Streaming for Apache Kafka Developer Guide

• JMX scrape interval: If you enable open monitoring with the Prometheus feature, it is

recommended that you use a 60 second or higher scrape interval (scrape_interval: 60s) for your

Prometheus host conﬁguration (prometheus.yml). Lowering the scrape interval can lead to high

CPU usage on your cluster.

Monitor disk space

To avoid running out of disk space for messages, create a CloudWatch alarm that watches the

KafkaDataLogsDiskUsed metric. When the value of this metric reaches or exceeds 85%, perform

one or more of the following actions:

• Use the section called “Automatic scaling for Amazon MSK clusters”. You can also manually

increase broker storage as described in the section called “Manual scaling”.

• Reduce the message retention period or log size. For information on how to do that, see the

section called “Adjust data retention parameters”.

• Delete unused topics.

For information on how to set up and use alarms, see Using Amazon CloudWatch Alarms. For a full

list of Amazon MSK metrics, see Monitor a cluster.

Adjust data retention parameters

Consuming messages doesn't remove them from the log. To free up disk space regularly, you can

explicitly specify a retention time period, which is how long messages stay in the log. You can also

specify a retention log size. When either the retention time period or the retention log size are

reached, Apache Kafka starts removing inactive segments from the log.

To specify a retention policy at the cluster level, set one or more of the following

parameters: log.retention.hours, log.retention.minutes, log.retention.ms, or

log.retention.bytes. For more information, see the section called “Custom Amazon MSK

conﬁgurations”.

You can also specify retention parameters at the topic level:

• To specify a retention time period per topic, use the following command.

Monitor disk space 393

Amazon Managed Streaming for Apache Kafka Developer Guide

kafka-configs.sh --bootstrap-server $bs --alter --entity-type topics --entity-

name TopicName --add-config retention.ms=DesiredRetentionTimePeriod

• To specify a retention log size per topic, use the following command.

kafka-configs.sh --bootstrap-server $bs --alter --entity-type topics --entity-

name TopicName --add-config retention.bytes=DesiredRetentionLogSize

The retention parameters that you specify at the topic level take precedence over cluster-level

parameters.

Speeding up log recovery after unclean shutdown

After an unclean shutdown, a broker can take a while to restart as it does log recovery. By

default, Kafka only uses a single thread per log directory to perform this recovery. For example,

if you have thousands of partitions, log recovery can take hours to complete. To speed up log

recovery, it's recommended to increase the number of threads using conﬁguration property

num.recovery.threads.per.data.dir. You can set it to the number of CPU cores.

Monitor Apache Kafka memory

We recommend that you monitor the memory that Apache Kafka uses. Otherwise, the cluster may

become unavailable.

To determine how much memory Apache Kafka uses, you can monitor the HeapMemoryAfterGC

metric. HeapMemoryAfterGC is the percentage of total heap memory that is in use after

garbage collection. We recommend that you create a CloudWatch alarm that takes action when

HeapMemoryAfterGC increases above 60%.

The steps that you can take to decrease memory usage vary. They depend on the way that you

conﬁgure Apache Kafka. For example, if you use transactional message delivery, you can decrease

the transactional.id.expiration.ms value in your Apache Kafka conﬁguration from

604800000 ms to 86400000 ms (from 7 days to 1 day). This decreases the memory footprint of

each transaction.

Speeding up log recovery after unclean shutdown 394

Amazon Managed Streaming for Apache Kafka Developer Guide

Don't add non-MSK brokers

For ZooKeeper-based clusters, if you use Apache ZooKeeper commands to add brokers, these

brokers don't get added to your MSK cluster, and your Apache ZooKeeper will contain incorrect

information about the cluster. This might result in data loss. For supported cluster operations, see

How it works.

Enable in-transit encryption

For information about encryption in transit and how to enable it, see the section called “Amazon

MSK encryption in transit”.

Reassign partitions

To move partitions to diﬀerent brokers on the same cluster, you can use the partition reassignment

tool named kafka-reassign-partitions.sh. For example, after you add new brokers to

expand a cluster or to move partitions in order to removing brokers, you can rebalance that cluster

by reassigning partitions to the new brokers. For information about how to add brokers to a

cluster, see the section called “Expand a Amazon MSK cluster”. For information about how to

remove brokers from a cluster, see the section called “Remove a broker”. For information about the

partition reassignment tool, see Expanding your cluster in the Apache Kafka documentation.

Don't add non-MSK brokers 395

Amazon Managed Streaming for Apache Kafka Developer Guide

Document history for Amazon MSK Developer Guide

The following table describes the important changes to the Amazon MSK Developer Guide.

Latest documentation update: June 25, 2024

Change Description Date

Graviton upgrade in place

feature added.

You can update your cluster

broker size from M5 or T3 to

M7g, or from M7g to M5.

2024-6-25

3.4.0 end of support date

announced.

End of support date for

Apache Kafka version 3.4.0 is

June 17, 2025.

2024-6-24

Broker removal feature

added.

You can reduce your provision

ed cluster’s storage and

compute capacity by

removing sets of brokers, with

no availability impact, data

durability risk, or disruptio

n to your data streaming

applications.

2024-5-16

WriteDataIdempoten

tly added to AWSMSKRep

licatorExecutionRole

WriteDataIdempotently

permission is added to

AWSMSKReplicatorEx

ecutionRole policy to support

data replication between MSK

clusters.

2024-5-16

Graviton M7g brokers

released in Brazil and Bahrain.

Amazon MSK now supports

South America (sa-east-

1, São Paulo) and Middle

East (me-south-1, Bahrain)

region availability of M7g

brokers using AWS Graviton

2024-2-07

396

Amazon Managed Streaming for Apache Kafka Developer Guide

Change Description Date

processors (custom Arm-

based processors built by

Amazon Web Services).

Release Graviton M7g brokers

to China region

Amazon MSK now supports

China region availability

of M7g brokers using AWS

Graviton processors (custom

Arm-based processors built by

Amazon Web Services).

2024-01-11

Amazon MSK Kafka version

support policy

Added explanation of the

Amazon MSK supported

Kafka version support policy.

For more information, see

Apache Kafka versions.

2023-12-08

New service execution role

policy to support Amazon

MSK Replicator.

Amazon MSK added new

AWSMSKReplicatorEx

ecutionRole policy

to support Amazon MSK

Replicator. For more informati

on, see AWS managed policy:

AWSMSKReplicatorEx

ecutionRole .

2023-12-06

M7g Graviton support Amazon MSK now supports

M7g brokers using AWS

Graviton processors (custom

Arm-based processors built by

Amazon Web Services).

2023-11-27

397

Amazon Managed Streaming for Apache Kafka Developer Guide

Change Description Date

Amazon MSK Replicator Amazon MSK Replicator is

a new feature that you can

use to replicate data between

Amazon MSK clusters.

Amazon MSK Replicator

includes an update to the

AmazonMSKFullAccess

policy. For more informati

on, see AWS managed policy:

AmazonMSKFullAccess .

2023-09-28

Updated for IAM best

practices.

Updated guide to align

with the IAM best practices

. For more information, see

Security best practices in IAM.

2023-03-08

Service-linked role updates

to support multi-VPC private

connectivity

Amazon MSK now includes

AWSServiceRoleForKafka

service-linked role updates

to manage network interface

s and VPC endpoints in

your account that make

cluster brokers accessible to

clients in your VPC. Amazon

MSK uses permissions to

DescribeVpcEndpoin

ts , ModifyVpcEndpoint

and DeleteVpcEndpoints

. For more information,

see Service-linked roles for

Amazon MSK.

2023-03-08

398

Amazon Managed Streaming for Apache Kafka Developer Guide

Change Description Date

Support for Apache Kafka

2.7.2

Amazon MSK now supports

Apache Kafka version 2.7.2.

For more information, see

Supported Apache Kafka

versions.

2021-12-21

Support for Apache Kafka

2.6.3

Amazon MSK now supports

Apache Kafka version 2.6.3.

For more information, see

Supported Apache Kafka

versions.

2021-12-21

MSK Serverless Prerelease MSK Serverless is a new

feature that you can use to

create serverless clusters. For

more information, see MSK

Serverless.

2021-11-29

Support for Apache Kafka

2.8.1

Amazon MSK now supports

Apache Kafka version 2.8.1.

For more information, see

Supported Apache Kafka

versions.

2021-09-30

MSK Connect MSK Connect is a new

feature that you can use to

create and manage Apache

Kafka connectors. For more

information, see Understand

MSK Connect.

2021-09-16

Support for Apache Kafka

2.7.1

Amazon MSK now supports

Apache Kafka version 2.7.1.

For more information, see

Supported Apache Kafka

versions.

2021-05-25

399

Amazon Managed Streaming for Apache Kafka Developer Guide

Change Description Date

Support for Apache Kafka

2.8.0

Amazon MSK now supports

Apache Kafka version 2.8.0.

For more information, see

Supported Apache Kafka

versions.

2021-04-28

Support for Apache Kafka

2.6.2

Amazon MSK now supports

Apache Kafka version 2.6.2.

For more information, see

Supported Apache Kafka

versions.

2021-04-28

Support for Updating Broker

Type

You can now change the

broker type for an existing

cluster. For more information,

see Update the Amazon MSK

cluster broker size.

2021-01-21

Support for Apache Kafka

2.6.1

Amazon MSK now supports

Apache Kafka version 2.6.1.

For more information, see

Supported Apache Kafka

versions.

2021-01-19

Support for Apache Kafka

2.7.0

Amazon MSK now supports

Apache Kafka version 2.7.0.

For more information, see

Supported Apache Kafka

versions.

2020-12-29

400

Amazon Managed Streaming for Apache Kafka Developer Guide

Change Description Date

No New Clusters with Apache

Kafka Version 1.1.1

You can no longer create a

new Amazon MSK cluster

with Apache Kafka version

1.1.1. However, if you have

existing MSK clusters that are

running Apache Kafka version

1.1.1, you can continue

using all of the currently-

supported features on those

existing clusters. For more

information, see Apache

Kafka versions.

2020-11-24

Consumer-Lag Metrics Amazon MSK now provides

metrics that you can use

to monitor consumer lag.

For more information, see

Monitor an Amazon MSK

cluster.

2020-11-23

Support for Cruise Control Amazon MSK now supports

LinkedIn's Cruise Control. For

more information, see Use

LinkedIn's Cruise Control for

Apache Kafka with Amazon

MSK.

2020-11-17

Support for Apache Kafka

2.6.0

Amazon MSK now supports

Apache Kafka version 2.6.0.

For more information, see

Supported Apache Kafka

versions.

2020-10-21

401

Amazon Managed Streaming for Apache Kafka Developer Guide

Change Description Date

Support for Apache Kafka

2.5.1

Amazon MSK now supports

Apache Kafka version 2.5.1.

With Apache Kafka version

2.5.1, Amazon MSK supports

encryption in transit between

clients and ZooKeeper

endpoints. For more informati

on, see Supported Apache

Kafka versions.

2020-09-30

Application Auto-Expansion You can conﬁgure Amazon

Managed Streaming for

Apache Kafka to automatic

ally expand your cluster's

storage in response to

increased usage. For more

information, see Automatic

scaling for Amazon MSK

clusters.

2020-09-30

Support for Username and

Password Security

Amazon MSK now supports

logging into clusters using

a username and password.

Amazon MSK stores credentia

ls in AWS Secrets Manager.

For more information, see

SASL/SCRAM authentication.

2020-09-17

Support for Upgrading the

Apache Kafka Version of an

Amazon MSK Cluster

You can now upgrade the

Apache Kafka version of an

existing MSK cluster.

2020-05-28

Support for T3.small Broker

Nodes

Amazon MSK now supports

creating clusters with brokers

of Amazon EC2 type T3.small.

2020-04-08

402

Amazon Managed Streaming for Apache Kafka Developer Guide

Change Description Date

Support for Apache Kafka

2.4.1

Amazon MSK now supports

Apache Kafka version 2.4.1.

2020-04-02

Support for Streaming Broker

Logs

Amazon MSK can now stream

broker logs to CloudWatc

h Logs, Amazon S3, and

Amazon Data Firehose.

Firehose can, in turn, deliver

these logs to the destinati

ons that it supports, such as

OpenSearch Service.

2020-02-25

Support for Apache Kafka

2.3.1

Amazon MSK now supports

Apache Kafka version 2.3.1.

2019-12-19

Open Monitoring Amazon MSK now supports

open monitoring with

Prometheus.

2019-12-04

Support for Apache Kafka

2.2.1

Amazon MSK now supports

Apache Kafka version 2.2.1.

2019-07-31

General Availability New features include tagging

support, authentication, TLS

encryption, conﬁgurations,

and the ability to update

broker storage.

2019-05-30

Support for Apache Kafka

2.1.0

Amazon MSK now supports

Apache Kafka version 2.1.0.

2019-02-05

403