How to debug no join result in KSQL

Hugo
2 min readJul 24, 2019

We know the restrictions of joins when using KSQL. Two sources must be partitioned by the same key (join key) and have the same number of partitions.

https://docs.confluent.io/current/ksql/docs/developer-guide/join-streams-and-tables.html#join-requirements

But when it still does not work, that join does not produce correct result so that join does not happen, and no error message is produced. What should I look for? I will share my experience as I didn’t find useful resource.

No new data

I did go through https://www.confluent.io/blog/troubleshooting-ksql-part-1 and https://docs.confluent.io/current/ksql/docs/troubleshoot-ksql.html.

For me the important takeaways is to ensure new data comes in to the source you want to check, and use

SET ‘auto.offset.reset’ = ‘earliest’;

is very convenient. Besides that, when you create table or stream, it would be very useful to check you can see some data:

select * from MY_STREAM limit 10;

select * from MY_TABLE limit 10;

Now I still don’t see any join result after running the join query, what happened?

Join key? rowkey?

Basically the rowkey should be the same, because it is the key used to join. What here means is the rowkey should be the same and you can see it.

select rowkey from clicks;

select rowkey from events;

What I experienced is the first one returns

“1”

“2”

“1”

and the following for the second

1

1

2

So the problem is the key of the first one should not have “”. Due to this difference, the join fail to produce correct result. The error comes from the key of the sample data used incorrectly, it should not have “”. I will share some test input here:

“1”, {“price”: 1}

is wrong, and it should be

1, {“price”:1}

The other thing worth mentioning is

CREATE STREAM … WITH(key=’id’)

does not repartition. If you check

select rowkey from YOUR_STREAM;

you will see in the query result, the key is what is produced to kafka.

You have to use persistent query that

CREATE STREAM … AS SELECT * FROM YOUR_STREAM PARTITION BY YOUR_JOIN_KEY;

And the above would create a topic that is partition by the key that used by join.

As a best practice, I would recommend to always use ROWKEY as join key, because using other field name, it does not mean it will work, and ROWKEY is the only approach to make it work.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response