Skip to content

[flink] Fix batch query on empty datalake-enabled table to return 0 rows instead of failing#3208

Open
matrixsparse wants to merge 2 commits intoapache:mainfrom
matrixsparse:feature/issue-3207-empty-lake-batch-query
Open

[flink] Fix batch query on empty datalake-enabled table to return 0 rows instead of failing#3208
matrixsparse wants to merge 2 commits intoapache:mainfrom
matrixsparse:feature/issue-3207-empty-lake-batch-query

Conversation

@matrixsparse
Copy link
Copy Markdown
Contributor

Summary

  • When a datalake-enabled table is empty (no lake snapshot), batch query via Flink SQL
    throws UnsupportedOperationException. This PR makes it return empty result
    (0 rows) instead, consistent with Spark connector behavior.

Root Cause

In FlinkSourceEnumerator.startInBatchMode(), generateHybridLakeFlussSplits()
returns null when no lake snapshot exists. The stream mode code in the same
commit (7937996) correctly falls back to initNonPartitionedSplits(), but the
batch mode path was left to throw an exception.

Change

  • FlinkSourceEnumerator.java: Return Collections.emptyList() instead of
    throwing UnsupportedOperationException when splits is null.

Verification

  • Ran mvnw test -pl fluss-flink/fluss-flink-common -Dtest="FlinkSourceEnumerator*"
  • All existing tests pass

Fixes #3207

Copy link
Copy Markdown
Contributor

@binary-signal binary-signal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT +1

@matrixsparse
Copy link
Copy Markdown
Contributor Author

Thanks for the review!

@binary-signal
Copy link
Copy Markdown
Contributor

@fresh-borzoni @luoyuxia PTAL

Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matrixsparse Thanks for the pr. Left one comment. PTAL

if (splits == null) {
throw new UnsupportedOperationException(
"Currently, Batch mode can only be supported if one lake snapshot exists for the table.");
return Collections.emptyList();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return empty list? Empty list will cause flink return empty result although actaully there are still records in fluss.
One way is that we can still generate only fluss split although lake snapshot doesn't exist.
Just like spark does in FlussLakeAppendBatch#doPlan

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. This has been updated: rather than returning an empty list, the logic now falls back to Fluss-only split generation, consistent with Spark's planFallbackPartitions(). Partition validation is also added for partitioned tables. @luoyuxia PTAL, thanks!

@matrixsparse matrixsparse force-pushed the feature/issue-3207-empty-lake-batch-query branch from f04fd1a to 7339c52 Compare April 30, 2026 12:40
Copy link
Copy Markdown
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matrixsparse Ty for the PR, LGTM 👍

() -> {
List<SourceSplitBase> splits = generateHybridLakeFlussSplits();
// No lake snapshot exists, fall back to Fluss-only splits
if (splits == null) {
Copy link
Copy Markdown
Contributor

@fresh-borzoni fresh-borzoni Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it's a good idea to add some info log for easier tracing, at least stream-mode has it when it goes through the lake path:

smth like:

LOG.info("No lake snapshot found for table {}, falling back to Fluss-only splits.", tablePath);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! Added the info log for tracing. Thanks @fresh-borzoni!

Copy link
Copy Markdown
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty, +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[flink] Batch query on empty datalake-enabled table should return 0 rows instead of failing with UnsupportedOperationException

4 participants