feat: add MSSQL/Fabric support to data-parity skill#705
feat: add MSSQL/Fabric support to data-parity skill#705suryaiyer95 wants to merge 20 commits intomainfrom
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Microsoft Fabric support and dialect-aware data-diff logic, extends the SQL Server driver with Azure AD token acquisition (cached) plus az CLI fallback, normalizes Fabric connection fields, bumps optional mssql to ^12.0.0, and includes tests and documentation updates. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Driver as SQL Server Driver
participant AzureId as `@azure/identity`
participant CLI as az CLI
participant AzureAD as Azure AD
participant MsSQL as mssql Library
App->>Driver: connect(config with Azure AD auth)
Driver->>Driver: normalize auth type & resolve resource URL
alt azure identity available
Driver->>AzureId: getToken(scope)
AzureId->>AzureAD: request token
AzureAD-->>AzureId: return token
AzureId-->>Driver: access token (cached)
else fallback to az CLI
Driver->>CLI: exec("az account get-access-token --resource <res>")
CLI->>AzureAD: request token
AzureAD-->>CLI: return token
CLI-->>Driver: token string (cached)
end
Driver->>MsSQL: new ConnectionPool(config with token)
MsSQL-->>Driver: connected pool
Driver-->>App: connected instance
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
5 issues found across 30 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:266">
P2: Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</violation>
<violation number="2" location="packages/opencode/src/altimate/native/connections/data-diff.ts:489">
P2: `partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</violation>
</file>
<file name="packages/opencode/src/altimate/native/connections/registry.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/registry.ts:125">
P2: `fabric` is missing from the `PASSWORD_DRIVERS` set. Since it maps to the same sqlserver driver as `sqlserver`/`mssql` (both present in the set), `fabric` should also be included to get the same friendly error when `password` is not a string.</violation>
</file>
<file name="packages/opencode/src/altimate/tools/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/tools/data-diff.ts:206">
P2: When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</violation>
</file>
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:148">
P1: `flattenRow` flattens all array values, but only the empty-string key (`""`) holds mssql's merged unnamed columns. A legitimate array column value (e.g. from JSON aggregation) will be incorrectly spread, corrupting the row data and misaligning columns. Restrict flattening to the `""` key only.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
403477e to
811c2be
Compare
|
@suryaiyer95 can you address code review comments? |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:90">
P2: Avoid `execSync` in the async connection path; it can block the event loop for up to the CLI timeout.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:819">
P2: `crossWarehouse` is computed from unresolved params, so an omitted `source_warehouse` can be misclassified as cross-warehouse when both sides actually run on the same default warehouse.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
Bun's runtime never fires native addon async callbacks, so the async `new duckdb.Database(path, opts, callback)` form would hit the 2-second timeout fallback on every connection attempt. Switch to the synchronous constructor form `new duckdb.Database(path)` / `new duckdb.Database(path, opts)` which throws on error and completes immediately in both Node and bun runtimes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The async callback form with 2s fallback was already working correctly at e3df5a4. The timeout was caused by a missing duckdb .node binary, not a bun incompatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `warehouseTypeToDialect()` mapping: sqlserver→tsql, mssql→tsql, fabric→fabric, postgresql→postgres, mariadb→mysql. Fixes critical serde mismatch where Rust engine rejects raw warehouse type names. - Update both `resolveDialect()` functions to use the mapping - Add MSSQL/Fabric cases to `dateTruncExpr()` — DATETRUNC(DAY, col) - Add locale-safe date literal casting via CONVERT(DATE, ..., 23) - Register `fabric` in DRIVER_MAP (reuses sqlserver TDS driver) - Add `fabric` normalize aliases in normalize.ts - Add 15 SQL Server driver unit tests (TOP injection, truncation, schema introspection, connection lifecycle, result format) - Add 9 dialect mapping unit tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Support all 7 Azure AD / Entra ID auth types in `sqlserver.ts`: `azure-active-directory-password`, `access-token`, `service-principal-secret`, `msi-vm`, `msi-app-service`, `azure-active-directory-default`, `token-credential` - Force TLS encryption for all Azure AD connections - Dynamic import of `@azure/identity` for `DefaultAzureCredential` - Add normalize aliases for Azure AD config fields (`authentication`, `azure_tenant_id`, `azure_client_id`, `azure_client_secret`, `access_token`) - Add `fabric: SQLSERVER_ALIASES` to DRIVER_ALIASES - Add 10 Azure AD unit tests covering all auth flows, encryption, and `DefaultAzureCredential` with managed identity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LL.md - Add SQL Server / Fabric schema inspection query in Step 2 - Add "SQL Server and Microsoft Fabric" section with: - Supported configurations table (sqlserver, mssql, fabric) - Fabric connection guide with Azure AD auth types - Algorithm behavior notes (joindiff vs hashdiff selection) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rscore column filter
- **Azure AD auth**: Pass `azure-active-directory-*` types directly to tedious
instead of constructing `DefaultAzureCredential` ourselves. Tedious imports
`@azure/identity` internally and creates credentials — avoids bun CJS/ESM
`isTokenCredential` boundary issue that caused "not an instance of the token
credential class" errors.
- **Auth shorthands**: Map `CLI`, `default`, `password`, `service-principal`,
`msi`, `managed-identity` to their full tedious type names.
- **Column filter**: Remove `_.startsWith("_")` filter from `execute()` result
columns — it stripped legitimate aliases like `_p` used by partition discovery,
causing partitioned diffs to return empty results.
- **Tests**: Remove `@azure/identity` mock (no longer imported by driver),
update auth assertions, add shorthand mapping tests, fix column filter test.
- **Verified**: All 97 driver tests pass. Full data-diff pipeline tested against
real MSSQL server (profile, joindiff, auto, where_clause, partitioned).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lattening - Upgrade `mssql` from v11 to v12 (`tedious` 18 → 19) - Use explicit `ConnectionPool` instead of global `mssql.connect()` to isolate multiple simultaneous connections - Flatten unnamed column arrays — `mssql` merges unnamed columns (e.g. `SELECT COUNT(*), SUM(...)`) into a single array under the empty-string key; restore positional column values - Proper column name resolution: compare `namedKeys.length` against flattened row length, fall back to synthetic `col_0`, `col_1`, etc. - Update test mock to export `ConnectionPool` class and `createMockPool` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions
Use ternary expressions (`x ? {...} : {}`) instead of short-circuit
(`x && {...}`) to avoid spreading a boolean value.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- P1: restrict `flattenRow` to only spread the empty-string key (`""`) where mssql merges unnamed columns, preserving legitimate array values - P2: escape single quotes in `partitionValue` for date-mode branches in `buildPartitionWhereClause` (categorical mode already escaped) - P2: add `fabric` to `PASSWORD_DRIVERS` set in registry for consistent password validation alongside `sqlserver`/`mssql` - P2: fallback to `"(no values)"` when `d.values` is nullish to prevent template literal coercing `undefined` to the string `"undefined"` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- sqlserver-unit: 3 tests for unnamed column flattening — verifies only the empty-string key is spread, legitimate named arrays are preserved - driver-normalize: fabric type uses SQLSERVER_ALIASES (server → host, trustServerCertificate → trust_server_certificate) - connections: fabric type is recognized in DRIVER_MAP and listed correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Minimum Version Requirements" table to SKILL.md covering SQL Server 2022+, mssql v12, and @azure/identity v4 with rationale for each - Document auth shorthands (CLI, default, password, service-principal, msi) - Move @azure/identity from dependencies to optional peerDependencies so it is NOT installed by default — only required for Azure AD auth - Add runtime check in sqlserver driver: if Azure AD auth type is requested but @azure/identity is missing, throw a clear install instruction error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…solution - For `azure-active-directory-default` (CLI/default auth), acquire token ourselves instead of delegating to tedious's internal `@azure/identity` - Strategy: try `DefaultAzureCredential` first, fall back to `az` CLI subprocess - Bypasses Bun resolving `@azure/identity` to browser bundle where `DefaultAzureCredential` is a non-functional stub - Also bypasses CJS/ESM `isTokenCredential` boundary mismatch - All 31 driver unit tests pass, verified against real Fabric endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oken` when none supplied
The `azure-active-directory-access-token` branch passed
`token: config.token ?? config.access_token` to tedious. When neither
field was set on a connection (e.g. a `fabric-migration` entry that
declared the auth type but no token), tedious threw:
TypeError: The "config.authentication.options.token" property
must be of type string
This blocked any Fabric/MSSQL config that relied on ambient credentials
(Azure CLI / managed identity) but used the explicit
`azure-active-directory-access-token` type instead of the `default`
shorthand.
Refactor token acquisition (`DefaultAzureCredential` → `az` CLI
fallback) into a shared `acquireAzureToken()` helper used by both the
`default` path and the `access-token` path when no token was supplied.
Callers that pass an explicit token are unchanged.
Also harden `mock.module("node:child_process", ...)` in
`sqlserver-unit.test.ts` to spread the real module so sibling tests in
the same `bun test` run keep access to `spawn` / `exec` / `fork`.
Tests: 110 pass, 0 fail in `packages/drivers`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ry mode When `source` and `target` are both SQL queries, `resolveTableSources` wraps them as `__diff_source` / `__diff_target` CTEs and the executor prepends the combined `WITH …` block to every engine-emitted task. T-SQL and Fabric parse-bind every CTE body even when unreferenced, so a task routed to the source warehouse failed to resolve the target-only base table referenced inside the unused `__diff_target` CTE (and vice versa), producing `Invalid object name` errors from the wrong warehouse. Return side-specific prefixes from `resolveTableSources` alongside the combined one, and have the executor loop in `runDataDiff` pick the source or target prefix per task when `source_warehouse !== target_warehouse`. Same-warehouse behaviour is unchanged. Adds `data-diff-cte.test.ts` covering plain-name passthrough, both-query wrapping, side-specific CTE isolation, and CTE merging with engine-emitted `WITH` clauses (10 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 333a45c moved `@azure/identity` from `optionalDependencies` to `peerDependencies` with `optional: true` in `packages/drivers/package.json`, but the lockfile was not regenerated. That left CI under `--frozen-lockfile` broken and made fresh installs silently diverge from the committed state. Running `bun install` brings the lockfile in sync: `@azure/identity` is recorded as an optional peer, and its transitive pins (`@azure/msal-browser`, `@azure/msal-common`, `@azure/msal-node`) re-resolve to the versions required by `tedious` and `snowflake-sdk`, matching the reachable runtime surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
52fe09e to
1977232
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
packages/opencode/src/altimate/native/connections/data-diff.ts (1)
815-820: Cross-warehouse detection is a purely textual compare.
sourceWarehouse !== targetWarehousetreatsundefinedvs an explicit name as "cross-warehouse" even when the explicit name is the default (first) warehouse. The side-specific CTE injection is a strict superset of the combined prefix, so correctness isn't affected — just a minor inefficiency (slightly narrower CTE sent). Not blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/connections/data-diff.ts` around lines 815 - 820, Summary: crossWarehouse currently does a raw textual compare (sourceWarehouse !== targetWarehouse) which can misclassify undefined vs an explicit default warehouse name as cross-warehouse. Fix: canonicalize both sides before comparing by resolving undefined/empty values to the effective/default warehouse name used elsewhere (use the same resolver used for connection/effective settings) instead of comparing raw params.source_warehouse and params.target_warehouse; update the variables referenced (sourceWarehouse, targetWarehouse, crossWarehouse) so crossWarehouse = resolvedSourceWarehouse !== resolvedTargetWarehouse. Ensure you call the existing resolution helper (or add one) rather than inventing a new default string.packages/drivers/test/sqlserver-unit.test.ts (1)
483-489: Assertion gap lets the "mixed named + unnamed" column-naming bug slip through.This test verifies
rowsfor{ name: "alice", "": [42] }but doesn't assertresult.columns. Under the current implementation (sqlserver.ts Lines 210-217), mixing a named column with multiple unnamed values (e.g.{ name: "alice", "": [42, 100] }) collapses the header to["col_0", "col_1", "col_2"]and dropsname. Adding a case with ≥ 2 unnamed values and assertingcolumnswould have caught that and will guard whichever fix lands.♻️ Suggested additional case
test("handles mix of named and unnamed columns", async () => { mockQueryResult = { recordset: [{ name: "alice", "": [42] }], } const result = await connector.execute("SELECT * FROM t") expect(result.rows).toEqual([["alice", 42]]) }) + + test("preserves named-column header when mixed with multiple unnamed columns", async () => { + // SELECT name, COUNT(*), SUM(x) FROM t → { name: "alice", "": [42, 100] } + mockQueryResult = { + recordset: [{ name: "alice", "": [42, 100] }], + } + const result = await connector.execute("SELECT name, COUNT(*), SUM(x) FROM t") + expect(result.rows).toEqual([["alice", 42, 100]]) + expect(result.columns[0]).toBe("name") + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 483 - 489, Add a new test variant in packages/drivers/test/sqlserver-unit.test.ts that uses a recordset like { name: "alice", "": [42, 100] } (i.e., multiple unnamed values) and assert both result.rows and result.columns so the header preserves the named column "name" and includes generated column names for the unnamed values (e.g., ["name","col_1","col_2"]); this will surface the bug referenced around sqlserver.ts (logic near lines handling column mapping) and prevent regressions in connector.execute's column assembly.packages/opencode/test/altimate/data-diff-dialect.test.ts (1)
51-54: Consider expanding case-insensitivity coverage.The case-insensitivity block only exercises
sqlserverandpostgresql. Sincemssql,fabric, andmariadbgo through the same normalization path, a one-line additional assertion (e.g.,MSSQL→tsql,FABRIC→fabric) would guard against a future regression where only some branches get lowercased.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/altimate/data-diff-dialect.test.ts` around lines 51 - 54, Add assertions to the "handles uppercase input" test to cover other inputs that share the normalization path so case-insensitivity can't regress: update the test block in packages/opencode/test/altimate/data-diff-dialect.test.ts that calls warehouseTypeToDialect to also assert uppercase variants such as "MSSQL" -> "tsql", "FABRIC" -> "fabric" and "MARIADB" -> "mariadb" (or whichever expected dialects map in warehouseTypeToDialect) so all branches that are lowercased are exercised.packages/drivers/src/sqlserver.ts (1)
89-100:execSyncblocks the event loop duringconnect().
execSync("az account get-access-token ...")with a 15 s timeout runs synchronously on the main thread inside anasyncfunction — any other pending I/O on the runtime is stalled while Azure CLI spins up (typically 0.5–2 s, occasionally much longer). Prefer the asyncexecFile(orpromisify(exec)) so the token fetch cooperates with the event loop.♻️ Suggested change
- try { - const { execSync } = await import("node:child_process") - const out = execSync( - "az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv", - { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] }, - ).trim() - if (out) token = out - } catch { - // az CLI not installed or not logged in - } + try { + const { execFile } = await import("node:child_process") + const { promisify } = await import("node:util") + const run = promisify(execFile) + const { stdout } = await run( + "az", + ["account", "get-access-token", "--resource", "https://database.windows.net/", "--query", "accessToken", "-o", "tsv"], + { encoding: "utf-8", timeout: 15000 }, + ) + const out = stdout.trim() + if (out) token = out + } catch { + // az CLI not installed or not logged in + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 89 - 100, The current token fetch inside connect() uses sync child process invocation (execSync) which blocks the event loop; replace it with an asynchronous call (e.g., use child_process.execFile or promisified exec via util.promisify) so the Azure CLI call runs without blocking. Locate the block that imports execSync and sets token (the token variable and the try/catch that calls "az account get-access-token ...") and change it to call the async API (await the execFile/promisified exec) with the same args and timeout, handle stdout.trim() to set token, and preserve the existing catch behavior for missing CLI or auth failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.opencode/skills/data-parity/SKILL.md:
- Around line 453-458: The fenced code block showing the pseudo-YAML config (the
block containing type: "fabric", host:
"<workspace-id>-<item-id>.datawarehouse.fabric.microsoft.com", database:
"<warehouse-name>", authentication: "azure-active-directory-default") is missing
a language tag; change the opening fence from ``` to ```yaml (or ```text) so
markdownlint MD040 is satisfied and the block is treated as YAML.
In `@packages/drivers/package.json`:
- Around line 20-33: Remove the redundant optional peer dependency entry for
"@azure/identity" from package.json: delete the "@azure/identity" key inside the
"peerDependencies" object and its corresponding entry in "peerDependenciesMeta"
since "mssql"/"tedious" already bundles it; update the package.json so only the
other dependencies (e.g., "mssql", "oracledb", "duckdb", "mongodb",
"@clickhouse/client") remain and no "@azure/identity" references exist under
"peerDependencies" or "peerDependenciesMeta".
In `@packages/drivers/src/sqlserver.ts`:
- Around line 53-54: The code calls rawAuth.toLowerCase() without ensuring
rawAuth is a string, causing a TypeError for non-string authentication values;
update the computation of authType in the sqlserver driver to first check typeof
rawAuth === "string" before using toLowerCase and looking up AUTH_SHORTHANDS,
e.g. only apply AUTH_SHORTHANDS[rawAuth.toLowerCase()] when rawAuth is a string,
otherwise pass rawAuth through (or leave undefined) so non-string auth blocks
(config.authentication) don't blow up at runtime; adjust the logic around
rawAuth, authType, and AUTH_SHORTHANDS to be type-safe and explicit.
- Around line 56-158: The code path that handles authType starting with
"azure-active-directory" (see authType variable and mssqlConfig.authentication
assignments) does not handle unknown subtypes and can leave mssqlConfig without
authentication; add a terminal else branch after the existing
azure-active-directory-* branches that throws a clear Error (e.g. mentioning the
unsupported authType) so unknown or misspelled Azure AD subtypes fail fast;
ensure the thrown error references authType so callers get an explicit message
instead of a later opaque tedious error.
- Around line 76-87: The empty catch around the `@azure/identity` import and
credential.getToken is hiding real authentication errors; change the catch to
capture the thrown error (e.g., catch (err) { azureIdentityErr = err }) and
preserve that error in a local variable, then either append its message/details
to the final thrown error or log it at debug level when token remains undefined;
update the block that currently throws the generic "install `@azure/identity` or
run az login" message to include the captured azureIdentityErr (or its
message/stack) so real AAD/authentication failures are visible while still
tolerating missing-module/browser-bundle cases.
- Around line 208-217: The current columns logic replaces known named headers
when sampleFlat is longer (mixed named + unnamed aggregates); change the
construction in the columns branch to preserve the existing named prefix
(namedKeys) and only synthesize column names for the extra flattened entries
(e.g., generate col_i for indices >= namedKeys.length up to
sampleFlat.length-1), falling back to result.recordset.columns only when both
namedKeys and sampleFlat are empty; update the conditional around
sampleFlat/namedKeys where columns is assigned (the expression that uses
sampleFlat, namedKeys, recordset, and result.recordset.columns) so known headers
are retained and only the tail is renamed.
- Around line 162-167: Remove the fallback to the global shared pool to avoid
reintroducing cross-database isolation; in the block that currently checks
MssqlConnectionPool, ensure that if MssqlConnectionPool is falsy you throw an
explicit error instead of calling mssql.connect, i.e., keep the branch that
constructs and connects a new MssqlConnectionPool(mssqlConfig) and replace the
else branch with a throw that references MssqlConnectionPool and mssql.connect
in the message so reviewers can quickly locate and understand the change.
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 154-159: The test "empty result returns correctly" uses a fake
mssql response shape by setting mockQueryResult.recordset_columns, but the
driver reads result.recordset?.columns; update the test to match real mssql
shape by removing recordset_columns and instead set mockQueryResult.recordset =
[] and, if you need to include column metadata, set
mockQueryResult.recordset.columns = {} (or leave absent for empty results) so
the driver code that checks result.recordset?.columns and the code paths in
connector.execute behave against the real response shape.
---
Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 89-100: The current token fetch inside connect() uses sync child
process invocation (execSync) which blocks the event loop; replace it with an
asynchronous call (e.g., use child_process.execFile or promisified exec via
util.promisify) so the Azure CLI call runs without blocking. Locate the block
that imports execSync and sets token (the token variable and the try/catch that
calls "az account get-access-token ...") and change it to call the async API
(await the execFile/promisified exec) with the same args and timeout, handle
stdout.trim() to set token, and preserve the existing catch behavior for missing
CLI or auth failures.
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 483-489: Add a new test variant in
packages/drivers/test/sqlserver-unit.test.ts that uses a recordset like { name:
"alice", "": [42, 100] } (i.e., multiple unnamed values) and assert both
result.rows and result.columns so the header preserves the named column "name"
and includes generated column names for the unnamed values (e.g.,
["name","col_1","col_2"]); this will surface the bug referenced around
sqlserver.ts (logic near lines handling column mapping) and prevent regressions
in connector.execute's column assembly.
In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 815-820: Summary: crossWarehouse currently does a raw textual
compare (sourceWarehouse !== targetWarehouse) which can misclassify undefined vs
an explicit default warehouse name as cross-warehouse. Fix: canonicalize both
sides before comparing by resolving undefined/empty values to the
effective/default warehouse name used elsewhere (use the same resolver used for
connection/effective settings) instead of comparing raw params.source_warehouse
and params.target_warehouse; update the variables referenced (sourceWarehouse,
targetWarehouse, crossWarehouse) so crossWarehouse = resolvedSourceWarehouse !==
resolvedTargetWarehouse. Ensure you call the existing resolution helper (or add
one) rather than inventing a new default string.
In `@packages/opencode/test/altimate/data-diff-dialect.test.ts`:
- Around line 51-54: Add assertions to the "handles uppercase input" test to
cover other inputs that share the normalization path so case-insensitivity can't
regress: update the test block in
packages/opencode/test/altimate/data-diff-dialect.test.ts that calls
warehouseTypeToDialect to also assert uppercase variants such as "MSSQL" ->
"tsql", "FABRIC" -> "fabric" and "MARIADB" -> "mariadb" (or whichever expected
dialects map in warehouseTypeToDialect) so all branches that are lowercased are
exercised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6589a3e0-5c1a-4ac6-bd8a-e405334413ca
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
.opencode/skills/data-parity/SKILL.mdpackages/drivers/package.jsonpackages/drivers/src/normalize.tspackages/drivers/src/sqlserver.tspackages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/src/altimate/native/connections/registry.tspackages/opencode/src/altimate/tools/data-diff.tspackages/opencode/test/altimate/connections.test.tspackages/opencode/test/altimate/data-diff-cte.test.tspackages/opencode/test/altimate/data-diff-dialect.test.tspackages/opencode/test/altimate/driver-normalize.test.ts
Fixes five correctness, reliability, and portability issues surfaced by the consensus code review of this branch. CRITICAL #1 — Cross-dialect partitioned diff (`data-diff.ts`): `runPartitionedDiff` built one partition WHERE clause with `sourceDialect` and passed it as shared `where_clause` to the recursive `runDataDiff`, which applied it to both warehouses identically. Cross-dialect partition mode (MSSQL → Postgres) failed because the target received T-SQL `DATETRUNC`/`CONVERT(DATE, …, 23)`. Now builds per-side WHERE using each warehouse's dialect and bakes it into dialect-quoted subquery SQL for source and target independently. The existing side-aware CTE injection handles the rest. MAJOR #2 — Azure AD token caching and refresh (`sqlserver.ts`): `acquireAzureToken` fetched a fresh token on every `connect()` and embedded it in the pool config with no refresh. Long-lived sessions silently failed when the ~1h token expired. Adds a module-scoped cache keyed by `(resource, client_id)` with proactive refresh 5 min before expiry, parsing `expiresOnTimestamp` from `@azure/identity` or the JWT `exp` claim from the `az` CLI fallback. Exposes `_resetTokenCacheForTests` for isolation. MAJOR #3 — `joindiff` + cross-warehouse guard (`data-diff.ts`): Explicit `algorithm: "joindiff"` combined with different warehouses produced broken SQL (one task referencing two CTE aliases with only one injected). Now returns an early error with a clear message steering users to `hashdiff` or `auto`. Cross-warehouse detection switched from warehouse-name string compare to dialect compare, matching the underlying SQL-divergence invariant. MAJOR #4 — Dialect-aware identifier quoting in CTE wrapping (`data-diff.ts`): `resolveTableSources` wrapped plain-table names with ANSI double-quotes for all dialects. T-SQL/Fabric require `QUOTED_IDENTIFIER ON` for this to work; default for `mssql`/tedious is ON, but user contexts (stored procs, legacy collations) can override. Now accepts source/target dialect parameters and delegates to `quoteIdentForDialect`, which was hoisted to module scope so it can be reused across partition and CTE paths. MAJOR #5 — Configurable Azure resource URL (`sqlserver.ts`, `normalize.ts`): Token acquisition hardcoded `https://database.windows.net/`, blocking Azure Government, Azure China, and sovereign-cloud customers. Now honours an explicit `azure_resource_url` config field and otherwise infers the URL from the host suffix (`.usgovcloudapi.net`, `.chinacloudapi.cn`). Adds the usual camelCase/snake_case aliases in the SQL Server normalizer. Also surfaces Azure auth error causes: if both `@azure/identity` and `az` CLI fail, the thrown error includes both hints (redacted) so users know why rather than seeing the generic "install @azure/identity or run az login" message. Tests: adds `data-diff-cross-dialect.test.ts` covering the cross-dialect partition WHERE routing and the `joindiff` guard; extends `data-diff-cte.test.ts` with dialect-aware quoting assertions for tsql, fabric, and mysql; extends `sqlserver-unit.test.ts` with cache hit / expiry refresh / client-id keyed cache tests, commercial/gov/china/custom resource URL resolution, and the combined-error-hints surface. All 41 sqlserver driver tests, 24 data-diff orchestrator tests, and 214 normalize/connections tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds first-class SQL Server (MSSQL) and Microsoft Fabric support to the data-parity/data-diff workflow, including dialect mapping to Rust engine dialect names, cross-warehouse-safe CTE injection, and expanded driver authentication/normalization/docs.
Changes:
- Add SQL Server driver capabilities (Azure AD auth flows, TOP injection, schema introspection, result shaping) and upgrade
mssqlto v12. - Add warehouse-type → Rust dialect mapping (
sqlserver/mssql→tsql,fabric→fabric, etc.) plus dialect-aware quoting and side-specific CTE injection for SQL-query mode and partitioned diffs. - Extend registry/normalization/tests/docs to recognize
fabricas a supported connection type routed through the SQL Server driver.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/opencode/test/altimate/driver-normalize.test.ts | Adds coverage ensuring fabric configs normalize using SQL Server aliasing. |
| packages/opencode/test/altimate/data-diff-dialect.test.ts | Adds tests for warehouse-type → Rust dialect mapping. |
| packages/opencode/test/altimate/data-diff-cte.test.ts | Adds tests for query detection, dialect-aware quoting, and side-specific CTE injection. |
| packages/opencode/test/altimate/data-diff-cross-dialect.test.ts | Adds tests for cross-dialect partition filtering and joindiff cross-warehouse guard behavior. |
| packages/opencode/test/altimate/connections.test.ts | Adds coverage for fabric being recognized/registered as a connection type. |
| packages/opencode/src/altimate/tools/data-diff.ts | Adjusts diff output formatting for missing diff row values. |
| packages/opencode/src/altimate/native/connections/registry.ts | Adds fabric → sqlserver driver routing. |
| packages/opencode/src/altimate/native/connections/data-diff.ts | Implements dialect mapping, dialect-aware quoting, side-specific CTE injection, and cross-dialect partition handling. |
| packages/drivers/test/sqlserver-unit.test.ts | Adds extensive unit tests for the SQL Server driver, auth flows, and result shaping. |
| packages/drivers/src/sqlserver.ts | Implements SQL Server driver with Azure AD token acquisition/caching, ConnectionPool isolation, and result shaping. |
| packages/drivers/src/normalize.ts | Adds SQL Server/Fabric config alias normalization for auth and Azure fields. |
| packages/drivers/package.json | Upgrades mssql dependency and adds optional peer dependency on @azure/identity. |
| bun.lock | Updates lockfile for mssql/tedious upgrade and peer dependency metadata. |
| .opencode/skills/data-parity/SKILL.md | Documents SQL Server/Fabric configuration, requirements, and behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
packages/drivers/src/sqlserver.ts (2)
255-262:⚠️ Potential issue | 🟡 MinorFallback to
mssql.connect()still defeats per-pool isolation.This was raised in a prior review and remains unresolved. Since
packages/drivers/package.jsonnow pinsmssql@^12.0.0whereConnectionPoolis a stable named export,MssqlConnectionPoolshould never be falsy in practice; if it somehow is, silently using the global shared pool re-introduces the cross-database interference bug the explicitConnectionPoolpath was added to fix. Throw instead.🔒️ Proposed fix
- if (MssqlConnectionPool) { - pool = new MssqlConnectionPool(mssqlConfig) - await pool.connect() - } else { - pool = await mssql.connect(mssqlConfig) - } + if (!MssqlConnectionPool) { + throw new Error( + "mssql.ConnectionPool is not available — the installed `mssql` package is too old. Upgrade to mssql@^12.", + ) + } + pool = new MssqlConnectionPool(mssqlConfig) + await pool.connect()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 255 - 262, The fallback branch that calls mssql.connect() reintroduces the global shared pool bug; replace the fallback with a hard failure instead of silently using the global pool: when MssqlConnectionPool is falsy, throw a descriptive error indicating ConnectionPool is required (referencing MssqlConnectionPool, ConnectionPool, and the pool initialization logic) so callers know the driver cannot operate without the explicit ConnectionPool; do not call mssql.connect() anywhere in this module.
302-312:⚠️ Potential issue | 🟡 MinorMixed named + multi-unnamed columns still drop named headers.
Already flagged in an earlier review and not addressed. For
SELECT name, COUNT(*), SUM(x) FROM t(recordset row{ name: "alice", "": [42, 100] }),namedKeys.lengthis 2 butsampleFlat.lengthis 3, so the branch falls through to synthesizingcol_0..col_2, losing the realnameheader. Preserve the known prefix and only synthesize names for the flattened tail.♻️ Proposed fix
- const rows = limitedRecordset.map(flattenRow) - const sampleFlat = rows.length > 0 ? rows[0] : [] - const namedKeys = recordset.length > 0 ? Object.keys(recordset[0]) : [] - const columns = - namedKeys.length === sampleFlat.length - ? namedKeys - : sampleFlat.length > 0 - ? sampleFlat.map((_: any, i: number) => `col_${i}`) - : (result.recordset?.columns - ? Object.keys(result.recordset.columns) - : []) + const rows = limitedRecordset.map(flattenRow) + const sampleFlat = rows.length > 0 ? rows[0] : [] + const sampleRow = recordset[0] ?? {} + const columns: string[] = [] + if (sampleFlat.length === 0) { + if (result.recordset?.columns) columns.push(...Object.keys(result.recordset.columns)) + } else { + let i = 0 + for (const [k, v] of Object.entries(sampleRow)) { + if (k === "" && Array.isArray(v)) { + for (let j = 0; j < v.length; j++) columns.push(`col_${i++}`) + } else { + columns.push(k === "" ? `col_${i}` : k) + i++ + } + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 302 - 312, The code currently drops known named headers when some columns are flattened (variables: limitedRecordset, flattenRow, rows, sampleFlat, namedKeys, columns), so change the columns construction to preserve the namedKeys prefix and only synthesize names for any extra flattened entries: if namedKeys.length === sampleFlat.length use namedKeys; if namedKeys.length < sampleFlat.length set columns = [...namedKeys, ...sampleFlat.slice(namedKeys.length).map((_, i) => `col_${namedKeys.length + i}`)]; otherwise (sampleFlat shorter) keep the current fallback logic that uses sampleFlat or result.recordset?.columns as before. Ensure this logic replaces the current ternary chain building columns.packages/drivers/test/sqlserver-unit.test.ts (1)
176-181:⚠️ Potential issue | 🟡 Minor
recordset_columnsis still not a real mssql shape.Previously flagged; the key remains incorrect.
mssqlexposes metadata atresult.recordset.columns, not as a siblingrecordset_columns, and the driver (seesqlserver.tsline 310) readsresult.recordset?.columns. The test passes only becauserecordsetis empty, so future readers will copy the wrong pattern.💚 Proposed fix
- mockQueryResult = { recordset: [], recordset_columns: {} } + const recordset: any = [] + recordset.columns = {} + mockQueryResult = { recordset }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 176 - 181, The test constructs a fake mssql result with the wrong metadata shape—using mockQueryResult.recordset_columns instead of the real mssql shape result.recordset.columns—so update the test ("empty result returns correctly" in packages/drivers/test/sqlserver-unit.test.ts) to build mockQueryResult with recordset: [] and an empty columns object under recordset.columns (i.e., mockQueryResult.recordset = Object.assign([], { columns: {} })) so the driver code in sqlserver.ts (which reads result.recordset?.columns) exercises the correct shape when calling connector.execute("SELECT * FROM t").
🧹 Nitpick comments (3)
packages/opencode/test/altimate/data-diff-cross-dialect.test.ts (1)
114-127: Strengthen the same-warehouse joindiff positive-path assertions.This test only checks
result.success === true, which passes as long as no guard fires — it won’t catch regressions where joindiff silently degrades (e.g., falls back to hashdiff, skips execution, or produces empty SQL). Since the negative test above assertssqlLog.length === 0, the symmetric positive assertion is cheap and much stronger.🧪 Suggested additional assertions
const result = await runDataDiff({ source: "dbo.orders", target: "dbo.orders_v2", key_columns: ["id"], source_warehouse: "shared", target_warehouse: "shared", algorithm: "joindiff", }) expect(result.success).toBe(true) + // Joindiff must actually dispatch SQL to the shared warehouse + expect(sqlLog.length).toBeGreaterThan(0) + expect(sqlLog.every((x) => x.warehouse === "shared")).toBe(true)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts` around lines 114 - 127, The test "same-warehouse joindiff is allowed" currently only checks result.success; strengthen it by also asserting that the joindiff actually executed SQL instead of silently degrading: after calling runDataDiff (the test using Registry.setConfigs and runDataDiff), assert that sqlLog (the request/SQL capture used in other tests) contains entries (e.g., sqlLog.length > 0) and that the executed statement or logged payload indicates joindiff behavior (e.g., contains "JOIN" or the joindiff-specific marker), so failures where joindiff falls back or produces no SQL are caught.packages/drivers/src/sqlserver.ts (2)
170-186:execSyncblocks the event loop inside an async path.The Azure CLI fallback uses synchronous
execSyncwith a 15-second timeout insideacquireAzureToken(), which will stall the entire event loop while waiting foraz account get-access-token. PreferexecFilepromisified (orspawn) so that other tasks (e.g. in-flight pool requests, other connectors initializing in parallel) continue to progress. Also, passing the command as a single formatted string throughexecSyncis a minor shell-injection concern ifresourceUrlis ever derived from untrusted input —execFilewith an argv array avoids that entirely.♻️ Suggested approach
- try { - const { execSync } = await import("node:child_process") - const out = execSync( - `az account get-access-token --resource ${resourceUrl} --query accessToken -o tsv`, - { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] }, - ).trim() + try { + const { execFile } = await import("node:child_process") + const { promisify } = await import("node:util") + const execFileP = promisify(execFile) + const { stdout } = await execFileP( + "az", + ["account", "get-access-token", "--resource", resourceUrl, "--query", "accessToken", "-o", "tsv"], + { encoding: "utf-8", timeout: 15000 }, + ) + const out = stdout.trim() if (out) { token = out expiresAt = parseTokenExpiry(out) } } catch (err: any) { azCliStderr = String(err?.stderr ?? err?.message ?? "").slice(0, 200).trim() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 170 - 186, The use of synchronous execSync inside acquireAzureToken blocks the event loop; replace it with a non-blocking child process call (e.g., promisified execFile or spawn) to run "az" with arguments ["account","get-access-token","--resource", resourceUrl,"--query","accessToken","-o","tsv"], capture stdout/stderr asynchronously, enforce the 15s timeout via timers or the child process API, and on success set token and expiresAt (via parseTokenExpiry(out)); on failure capture stderr/message into azCliStderr (trim to 200 chars) and preserve existing error-handling flow so behavior of token, expiresAt, and azCliStderr remains the same.
140-204: Concurrent cache-miss calls will parallel-fetch the same token.If two
connect()calls for the same(resource, clientId)start before either populates the cache, both will independently runDefaultAzureCredential.getToken()(or theazCLI) and both write to the cache. This is wasteful for AAD throttling and surprising during app startup where several diffs connect in parallel. Caching the in-flightPromise<string>(rather than the resolved token) de-duplicates concurrent acquisitions for free — and on rejection you simply evict and let the next caller retry.const inflight = new Map<string, Promise<string>>() // inside acquireAzureToken: const existing = inflight.get(cacheKey) if (existing) return existing const p = (async () => { /* existing acquisition body */ })() .finally(() => inflight.delete(cacheKey)) inflight.set(cacheKey, p) return p🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 140 - 204, acquireAzureToken currently allows concurrent cache-miss callers to run duplicate AAD/az CLI fetches; introduce an inflight Map<string, Promise<string>> (e.g., inflight) keyed by cacheKey and at the start of acquireAzureToken return an existing inflight.get(cacheKey) if present, otherwise wrap the entire acquisition logic in an async Promise, set inflight.set(cacheKey, promise) before awaiting it, and in promise.finally() delete the inflight entry so failed attempts can be retried; keep using tokenCache as before (check cached token first, write resolved token into tokenCache) and on rejection ensure the inflight entry is removed so subsequent callers will create a new fetch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 576-596: The "different clientIds cache separately" test currently
only asserts both connections received "shared-token" which doesn't prove
separate caching; modify the test in the test block named "different clientIds
cache separately" to either (1) make azureIdentityState.tokenOverride return
distinct tokens per call (so the first connect yields "token-1" and the second
"token-2") and assert mockConnectCalls[0].authentication.options.token and
mockConnectCalls[1].authentication.options.token differ accordingly, or (2) keep
the shared-token but assert the underlying getToken mock was invoked twice (e.g.
check the getToken invocation counter after resetMocks and the two connect()
calls equals 2) to prove cache keying is per azure_client_id; refer to
azureIdentityState.tokenOverride, resetMocks(), connect(...), and
mockConnectCalls to implement the change.
In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts`:
- Around line 27-42: The module-level mocks created with
mock.module("@altimateai/altimate-core", ...) (and the registry mock) can leak
across tests; add cleanup by calling mock.restoreModule() in an afterEach (or
afterAll) to restore original module exports, or replace the module-level
mock.module usage with scoped spyOn calls inside the test suite setup (e.g.,
inside beforeEach) so the mock is not process-scoped; update the test file to
call mock.restoreModule() after each test or convert the mocks to spyOn to
prevent cross-file leakage.
---
Duplicate comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 255-262: The fallback branch that calls mssql.connect()
reintroduces the global shared pool bug; replace the fallback with a hard
failure instead of silently using the global pool: when MssqlConnectionPool is
falsy, throw a descriptive error indicating ConnectionPool is required
(referencing MssqlConnectionPool, ConnectionPool, and the pool initialization
logic) so callers know the driver cannot operate without the explicit
ConnectionPool; do not call mssql.connect() anywhere in this module.
- Around line 302-312: The code currently drops known named headers when some
columns are flattened (variables: limitedRecordset, flattenRow, rows,
sampleFlat, namedKeys, columns), so change the columns construction to preserve
the namedKeys prefix and only synthesize names for any extra flattened entries:
if namedKeys.length === sampleFlat.length use namedKeys; if namedKeys.length <
sampleFlat.length set columns = [...namedKeys,
...sampleFlat.slice(namedKeys.length).map((_, i) => `col_${namedKeys.length +
i}`)]; otherwise (sampleFlat shorter) keep the current fallback logic that uses
sampleFlat or result.recordset?.columns as before. Ensure this logic replaces
the current ternary chain building columns.
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 176-181: The test constructs a fake mssql result with the wrong
metadata shape—using mockQueryResult.recordset_columns instead of the real mssql
shape result.recordset.columns—so update the test ("empty result returns
correctly" in packages/drivers/test/sqlserver-unit.test.ts) to build
mockQueryResult with recordset: [] and an empty columns object under
recordset.columns (i.e., mockQueryResult.recordset = Object.assign([], {
columns: {} })) so the driver code in sqlserver.ts (which reads
result.recordset?.columns) exercises the correct shape when calling
connector.execute("SELECT * FROM t").
---
Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 170-186: The use of synchronous execSync inside acquireAzureToken
blocks the event loop; replace it with a non-blocking child process call (e.g.,
promisified execFile or spawn) to run "az" with arguments
["account","get-access-token","--resource",
resourceUrl,"--query","accessToken","-o","tsv"], capture stdout/stderr
asynchronously, enforce the 15s timeout via timers or the child process API, and
on success set token and expiresAt (via parseTokenExpiry(out)); on failure
capture stderr/message into azCliStderr (trim to 200 chars) and preserve
existing error-handling flow so behavior of token, expiresAt, and azCliStderr
remains the same.
- Around line 140-204: acquireAzureToken currently allows concurrent cache-miss
callers to run duplicate AAD/az CLI fetches; introduce an inflight Map<string,
Promise<string>> (e.g., inflight) keyed by cacheKey and at the start of
acquireAzureToken return an existing inflight.get(cacheKey) if present,
otherwise wrap the entire acquisition logic in an async Promise, set
inflight.set(cacheKey, promise) before awaiting it, and in promise.finally()
delete the inflight entry so failed attempts can be retried; keep using
tokenCache as before (check cached token first, write resolved token into
tokenCache) and on rejection ensure the inflight entry is removed so subsequent
callers will create a new fetch.
In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts`:
- Around line 114-127: The test "same-warehouse joindiff is allowed" currently
only checks result.success; strengthen it by also asserting that the joindiff
actually executed SQL instead of silently degrading: after calling runDataDiff
(the test using Registry.setConfigs and runDataDiff), assert that sqlLog (the
request/SQL capture used in other tests) contains entries (e.g., sqlLog.length >
0) and that the executed statement or logged payload indicates joindiff behavior
(e.g., contains "JOIN" or the joindiff-specific marker), so failures where
joindiff falls back or produces no SQL are caught.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dd24dc82-bab2-4760-ab04-8fc4b312c420
📒 Files selected for processing (6)
packages/drivers/src/normalize.tspackages/drivers/src/sqlserver.tspackages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/test/altimate/data-diff-cross-dialect.test.tspackages/opencode/test/altimate/data-diff-cte.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/drivers/src/normalize.ts
- packages/opencode/test/altimate/data-diff-cte.test.ts
- packages/opencode/src/altimate/native/connections/data-diff.ts
There was a problem hiding this comment.
3 issues found across 6 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:50">
P2: Normalize explicit `azure_resource_url` to include a trailing slash before appending `.default`, otherwise `getToken` can receive an invalid scope string.</violation>
</file>
<file name="packages/opencode/test/altimate/data-diff-cross-dialect.test.ts">
<violation number="1" location="packages/opencode/test/altimate/data-diff-cross-dialect.test.ts:97">
P2: Use the same dialect on both warehouses here. As written, this only covers mixed-dialect diffing and won't catch same-dialect cross-warehouse joindiff regressions.</violation>
</file>
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:870">
P1: `crossWarehouse` is derived from dialect equality, so joindiff is incorrectly allowed across different warehouses that use the same dialect.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
…lot)
Addresses the remaining issues raised by coderabbitai, cubic-dev-ai, and the
Copilot PR reviewer on top of the multi-model consensus fix.
### CRITICAL
- **`@azure/identity` peer dep removed** (`drivers/package.json`)
`mssql@12` → `tedious@19` bundles `@azure/identity ^4.2.1` as a regular
dependency. Declaring it here as an optional peer was redundant and caused
transitive-version-drift concerns. Users get the correct version
automatically through the tedious chain; our lazy import handles the
browser-bundle edge case itself.
### MAJOR
- **Cross-dialect date partition literal normalization** (`data-diff.ts`)
`buildPartitionDiscoverySQL` on MSSQL returns a JS `Date` object, stringified
upstream as `"Mon Jan 01 2024 …"`. `CONVERT(DATE, …, 23)` rejects that format.
Normalize `partitionValue` to ISO `yyyy-mm-dd` before dialect casting so the
T-SQL/Fabric path works end-to-end on dates discovered from MSSQL sources.
- **`crossWarehouse` uses resolved warehouse identity** (`data-diff.ts`)
Previous commit gated on dialect compare, which treated two independent
MSSQL instances as "same warehouse" and would have let `joindiff` route a
JOIN through a warehouse that can't resolve the other side's base tables.
Now resolves both sides' warehouse name (falling back to the default
warehouse when a side is omitted) and compares identities — identity-based
gating handles both the "undefined vs default" case (cubic) and the
"same-dialect, different instance" case (Copilot).
- **Drop `mssql.connect()` fallback** (`sqlserver.ts`)
`mssql@^12` guarantees `ConnectionPool` as a named export. The fallback
silently re-introduced the global-shared-pool bug this branch was added
to fix. Now throws a descriptive error if `ConnectionPool` is missing —
cross-database pool interference cannot regress.
- **Non-string `config.authentication` guarded** (`sqlserver.ts`)
Caller passing a pre-built `{ type, options }` block (or `null`) previously
crashed with `TypeError: rawAuth.toLowerCase is not a function`. Now only
applies the shorthand lookup when `rawAuth` is a string; other values pass
through so tedious can handle them or reject them with its own error.
- **Unknown `azure-active-directory-*` subtype fails fast** (`sqlserver.ts`)
Typos or future tedious subtypes previously dropped through all `else if`
branches, producing a config with `encrypt: true` but no `authentication`
block. tedious then surfaced an opaque error far from the root cause.
Now throws with the offending subtype and the supported list.
- **`execSync` replaced with async `exec`** (`sqlserver.ts`)
The `az account get-access-token` CLI fallback previously blocked the
event loop for up to 15s. Switched to `util.promisify(exec)` so the
connection path stays non-blocking.
- **Mixed named + unnamed column derivation preserves headers** (`sqlserver.ts`)
Previously `SELECT name, COUNT(*), SUM(x)` produced either `["name", ""]`
(blank header) or `["col_0", "col_1", "col_2"]` (lost `name`). Rewrote
column/row derivation to iterate in one pass, preserving known named
columns and synthesizing `col_N` only for expanded `""`-key positions.
### MINOR
- **`(no values)` fallback for empty `diff_row.values` array** (`tools/data-diff.ts`)
`[].join(" | ") ?? "(no values)"` never fires because `""` is falsy-but-not-
nullish. Gate on `d.values?.length` instead.
### Test / docs
- `sqlserver-unit.test.ts`: token-cache client-id test now counts actual
`getToken` invocations (previous version only verified both got the same
mocked token, which proved nothing about keying).
- `sqlserver-unit.test.ts`: "empty result" test now mirrors the real mssql
shape (`recordset.columns` is a property *on* the recordset array, not a
sibling key).
- `sqlserver-unit.test.ts`: added mixed-column regression tests — "name +
COUNT + SUM" and "single unnamed column" — to lock in the derivation fix.
- `sqlserver-unit.test.ts`: stubbed async `exec` via `util.promisify.custom`
so tests drive both the `execSync` legacy path and the new async path.
- `SKILL.md`: Fabric config fenced block now declares `yaml` (markdownlint
MD040).
All tests: 43/43 sqlserver driver + 238/238 opencode test suite.
Attribution: findings identified by coderabbitai, cubic-dev-ai, and the
Copilot PR reviewer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/drivers/src/sqlserver.ts (1)
147-218: Consider single-flight dedup for concurrent cold-cache token acquisitions.If two (or many)
connect()calls for the same(resourceUrl, clientId)race before the first token lands in the cache, each call runs the fullDefaultAzureCredential→azpath independently — exactly the "wasteful, risks throttling" scenario the cache comment flags. A small in-flight map (Map<string, Promise<string>>) next totokenCachecollapses concurrent cold requests to a single network roundtrip.♻️ Suggested approach
const tokenCache = new Map<string, { token: string; expiresAt: number }>() +const inflightTokenRequests = new Map<string, Promise<string>>()Then in
acquireAzureToken:const acquireAzureToken = async (): Promise<string> => { const cached = tokenCache.get(cacheKey) if (cached && cached.expiresAt - Date.now() > TOKEN_REFRESH_MARGIN_MS) { return cached.token } + const existing = inflightTokenRequests.get(cacheKey) + if (existing) return existing + const p = (async () => { + // ... existing acquisition + cache-set logic ... + return token! + })().finally(() => inflightTokenRequests.delete(cacheKey)) + inflightTokenRequests.set(cacheKey, p) + return p }Low priority — only matters when multiple pools spin up concurrently at process start — but it's cheap insurance for the documented motivation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 147 - 218, acquireAzureToken can run multiple concurrent cold-cache fetches; add a single-flight map (e.g. const inFlight = new Map<string, Promise<string>>()) alongside tokenCache and use cacheKey as the key: at start of acquireAzureToken check if inFlight.has(cacheKey) and return the stored Promise; if not, create the Promise that performs the existing token acquisition logic (DefaultAzureCredential path then az CLI fallback), store it in inFlight before awaiting internals, and ensure you remove(inFlight) in both fulfillment and rejection handlers so failures don't permanently block retries; keep existing tokenCache.set and TOKEN_REFRESH_MARGIN_MS logic unchanged.packages/drivers/test/sqlserver-unit.test.ts (1)
119-131: Optional: drop theexecSyncmock branch now that production uses asyncexec.The driver no longer calls
execSync(it was replaced withpromisify(exec)inpackages/drivers/src/sqlserver.tsLine 183-189). TheexecSyncstub here is dead mock-surface — keeping it risks implyingexecSyncis still a supported fallback path in the driver.♻️ Suggested cleanup
mock.module("node:child_process", () => ({ ...realChildProcess, - execSync: (cmd: string) => { - cliState.lastCmd = cmd - if (cliState.throwError) { - const e: any = new Error(cliState.throwError.message ?? "az failed") - e.stderr = cliState.throwError.stderr - throw e - } - return cliState.output - }, exec: execStub, }))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 119 - 131, Remove the dead execSync branch from the test mock: in the mock.module call that currently spreads realChildProcess and defines execSync and exec, delete the execSync property and its closure (the cliState handling and thrown Error creation) so only exec: execStub remains; keep the realChildProcess spread and existing execStub usage (referencing mock.module, execStub, and cliState) to reflect that production now uses the async promisified exec path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 186-200: The current call uses execAsync with a shell-interpolated
command including resourceUrl (user input), risking shell injection; replace
that usage by invoking execFile (or your promisified execFileAsync) to call "az"
with args ["account", "get-access-token", "--resource", resourceUrl, "--query",
"accessToken", "-o", "tsv"] so the value is passed as an argv element (no
shell), retain the same options (encoding, timeout) and preserve the existing
handling of stdout -> out -> token/expiresAt and azCliStderr in the catch; also
update the related test stub to mock execFile/execFileAsync instead of execAsync
or assert the argv array.
In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 810-820: runPartitionedDiff currently replaces
params.source/params.target with SELECT subqueries and then calls runDataDiff,
which prevents discoverExtraColumns from recognizing real table names and causes
extra_columns to be lost; fix this by running discoverExtraColumns (and
capturing discoveredExcluded) inside runPartitionedDiff while
params.source/params.target are still plain table names, then pass the
discovered extra_columns into the recursive runDataDiff call (e.g., set
params.extra_columns and params.excluded_audit_columns or forward
discoveredExcluded) instead of letting runDataDiff attempt discovery on the
SELECT strings; ensure you do not set partition_column in the recursive call as
before.
---
Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 147-218: acquireAzureToken can run multiple concurrent cold-cache
fetches; add a single-flight map (e.g. const inFlight = new Map<string,
Promise<string>>()) alongside tokenCache and use cacheKey as the key: at start
of acquireAzureToken check if inFlight.has(cacheKey) and return the stored
Promise; if not, create the Promise that performs the existing token acquisition
logic (DefaultAzureCredential path then az CLI fallback), store it in inFlight
before awaiting internals, and ensure you remove(inFlight) in both fulfillment
and rejection handlers so failures don't permanently block retries; keep
existing tokenCache.set and TOKEN_REFRESH_MARGIN_MS logic unchanged.
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 119-131: Remove the dead execSync branch from the test mock: in
the mock.module call that currently spreads realChildProcess and defines
execSync and exec, delete the execSync property and its closure (the cliState
handling and thrown Error creation) so only exec: execStub remains; keep the
realChildProcess spread and existing execStub usage (referencing mock.module,
execStub, and cliState) to reflect that production now uses the async
promisified exec path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 2c693281-52dc-47f7-b49c-a79a1d7331cd
📒 Files selected for processing (6)
.opencode/skills/data-parity/SKILL.mdpackages/drivers/package.jsonpackages/drivers/src/sqlserver.tspackages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/src/altimate/tools/data-diff.ts
✅ Files skipped from review due to trivial changes (2)
- packages/drivers/package.json
- .opencode/skills/data-parity/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opencode/src/altimate/tools/data-diff.ts
Commit 38cfb0e removed `@azure/identity` from the drivers package's `peerDependencies` (tedious already bundles it), but the lockfile's `packages/drivers` workspace section still carried the corresponding `peerDependencies` and `optionalPeers` blocks. CI running `bun install --frozen-lockfile` would fail on the drift. Minimal edit — just removes the two stale blocks. No resolution changes (`bun install --frozen-lockfile` passes with "no changes"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 7 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:605">
P1: The new date parsing rewrites MySQL/MariaDB week partition values (`YYYY-WW`) into `YYYY-MM-DD`, causing partition WHERE clauses that never match.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
The prior integration-style test mocked the Registry module globally with
`mock.module(".../registry", ...)`, which leaks across all test files in
bun:test's single-process runner. That caused 14 unrelated tests in
`connections.test.ts`, `telemetry-safety.test.ts`, and
`dbt-first-execution.test.ts` to fail in CI.
Additionally, the test relied on `mock.module("@altimateai/altimate-core")`
to supply a fake `DataParitySession`. The npm-published 0.2.6 of that
package does not export `DataParitySession` (sessions are only in the
locally-built `altimate-core-internal` binary), and Bun's `mock.module`
cannot override a package that another test file has already imported —
so the integration test was structurally unreliable.
Resolution:
1. **Export pure SQL-builder helpers** from `data-diff.ts`
(`dateTruncExpr`, `buildPartitionWhereClause`) and unit-test them
directly. No module mocking required; the test directly exercises the
logic the CRITICAL/MAJOR fix changed.
2. **Move the `joindiff` + cross-warehouse guard earlier** in `runDataDiff`
— before the NAPI import. Semantically identical for callers (guard
still fires, same error message, `steps: 0`), but now it can be
integration-tested without any NAPI mock. Preserves end-to-end
wiring coverage for the guard.
3. **Rewrite `data-diff-cross-dialect.test.ts`** as pure-function unit
tests for the partition WHERE logic + a real `runDataDiff` call for
the joindiff guard. No more cross-file mock pollution.
Functionality unchanged:
- `runDataDiff` behavior for real callers is identical. The only
observable difference is error-ordering: if a caller simultaneously
omits NAPI and passes `joindiff + cross-warehouse`, they now get the
"joindiff requires same warehouse" error instead of the NAPI-missing
error. That's strictly better UX — NAPI availability is a deployment
concern, `joindiff`+cross-warehouse is a user error.
- `buildPartitionWhereClause` and `dateTruncExpr` are now exported but
semantically unchanged — same inputs, same outputs.
Test results:
- 2821 altimate tests pass, 0 fail
- 43 sqlserver driver tests pass, 0 fail
- The 19 remaining full-suite failures (`mcp/`, `tool/project-scan`,
`plan-approval-phrase`) are pre-existing on `main` and unrelated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/MINOR) Addresses 5 substantive issues raised by the latest round of bot reviews. ### P1 / MAJOR - **MySQL/MariaDB week-partition values no longer corrupted** (cubic P1, data-diff.ts:610) — the prior ISO `yyyy-mm-dd` normalization applied to every dialect silently rewrote MySQL `DATE_FORMAT(%Y-%u)` outputs like `"2024-42"` into invalid dates, producing WHERE clauses that never match. Scope the normalization to T-SQL / Fabric only — those use `CONVERT(DATE, …, 23)` which is the only code path that requires ISO. Postgres, MySQL, ClickHouse, BigQuery, Oracle all get the raw value verbatim, matching their own `DATE_TRUNC`/`toStartOf*` output. - **Partitioned diff no longer drops extra_columns** (coderabbit MAJOR, data-diff.ts:824) — the partition fix wraps each side as a SELECT subquery before recursing. `discoverExtraColumns` skips SQL queries (only inspects plain table names), so the recursive `runDataDiff` fell through to key-only comparison, silently losing value-level diffs. Now `runPartitionedDiff` runs discovery ONCE on the plain source table up-front and passes the resolved `extra_columns` explicitly to each recursive call. Audit-column exclusion metadata is also propagated to the aggregated result for user reporting. ### P2 / MINOR - **`azure_resource_url` trailing slash normalized** (cubic P2, sqlserver.ts:50) — an explicit `"https://custom-host"` (no slash) would produce an invalid OAuth scope `"https://custom-host.default"`. Enforce a trailing slash in `resolveAzureResourceUrl`. - **`az account get-access-token` uses `execFile`** (coderabbit, sqlserver.ts:200) — replaces `exec(<shell command string>)` with `execFile("az", [args])` so user-supplied `azure_resource_url` can't introduce shell metacharacters into the command string. Also updates the test harness to stub both `exec` and `execFile`. ### Test isolation / coverage - **Added same-dialect cross-warehouse joindiff test** (cubic, data-diff-cross-dialect.test.ts:97) — two MSSQL servers with different hosts must still be gated by the joindiff guard; previous tests only exercised mixed dialects. - **Added MySQL week-partition regression tests** — prevent future revivals of the dialect-unaware ISO rewrite. - **Added trailing-slash `azure_resource_url` test.** Test results: - 44/44 sqlserver driver tests pass - 2824/2824 altimate tests pass, 0 fail - Remaining full-suite failures (`mcp/`, `tool/project-scan`, `plan-approval-phrase`) are pre-existing on `main`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/drivers/src/sqlserver.ts (1)
316-328:⚠️ Potential issue | 🟠 MajorPreserve
DISTINCT/ALLplacement when injectingTOP.Line 325 inserts
TOPimmediately afterSELECT, producing invalid T-SQL:SELECT DISTINCT col FROM tbecomesSELECT TOP 1001 DISTINCT col FROM t. Per Microsoft's official syntax,DISTINCT/ALLmust precedeTOP:SELECT [ ALL | DISTINCT ] [ TOP ( expression ) ] .... Update the regex to capture and preserve these quantifiers, insertingTOPafter them.🐛 Proposed fix
let query = sql const isSelectLike = /^\s*SELECT\b/i.test(sql) + const hasTop = /^\s*SELECT\s+(?:ALL\s+|DISTINCT\s+)?TOP\b/i.test(sql) // SQL Server uses TOP, not LIMIT if ( isSelectLike && effectiveLimit && - !/\bTOP\b/i.test(sql) && + !hasTop && !/\bLIMIT\b/i.test(sql) ) { - // Insert TOP after SELECT + // Insert TOP after SELECT, preserving SELECT DISTINCT/ALL syntax. query = sql.replace( - /^(\s*SELECT\s)/i, - `$1TOP ${effectiveLimit + 1} `, + /^(\s*SELECT\s+)((?:ALL|DISTINCT)\s+)?/i, + (_match, selectPrefix, quantifier = "") => + `${selectPrefix}${quantifier}TOP ${effectiveLimit + 1} `, ) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/drivers/src/sqlserver.ts` around lines 316 - 328, The injection of TOP after SELECT in the SQL transform mistakenly places TOP before any optional quantifier (DISTINCT/ALL); update the replacement logic in the block that checks isSelectLike/effectiveLimit (the code using sql.replace) to capture an optional ALL|DISTINCT token (and following whitespace) and insert `TOP ${effectiveLimit + 1}` after that token when present, i.e., change the regex so it matches `SELECT` plus optional `DISTINCT|ALL` group and use replacement groups to preserve the quantifier positioning and spacing while inserting TOP in the correct spot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 54-66: In resolveAzureResourceUrl, validate that
config.azure_resource_url is a string before calling string methods: if the
explicit value exists but is not a string, throw a clear configuration error;
otherwise use the explicit string, or fallback to the existing host-based
detection (using config.host) and then ensure the returned URL ends with a
slash. This prevents raw.endsWith is not a function by checking typeof
config.azure_resource_url === "string" (and throwing) before using raw.endsWith
and preserves current behavior of resolveAzureResourceUrl.
- Around line 151-173: The token cache and acquisition ignore
config.azure_tenant_id, causing tenant cross-use; update the cacheKey in
acquireAzureToken to include config.azure_tenant_id (e.g.,
`${resourceUrl}|${clientId}|${tenantId||""}`) and when creating
DefaultAzureCredential (in the acquireAzureToken function) pass the tenantId
option when config.azure_tenant_id is present (and likewise include tenant when
any CLI/alternative credential construction supports a tenant option) so tokens
are requested and cached per tenant; reference resolveAzureResourceUrl,
acquireAzureToken, tokenCache, config.azure_tenant_id, and
DefaultAzureCredential to locate the changes.
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 69-83: The test's Azure identity mock always returns a token so it
won't catch ignored azure_client_id; update the DefaultAzureCredential mock
(inside mock.module("@azure/identity")) to save constructor options into
azureIdentityState (e.g., set azureIdentityState.lastCtorOpts = opts in the
constructor) and then add an assertion in the test that
azureIdentityState.lastCtorOpts.managedIdentityClientId equals the expected
azure_client_id; repeat the same change for the other DefaultAzureCredential
mock used later in the file so both mocks validate the managedIdentityClientId
constructor option.
- Around line 88-151: The test currently only records cliState.lastCmd so a
regression back to using exec (shell interpolation) would still pass; modify the
mocks to track which child_process API was invoked (e.g., add cliState.lastApi
or similar) by setting it in the execStub and execFileStub returned from
makeChildProcessMock (or inside makeChildProcessMock based on how it's
constructed), then update the assertion at the test location to assert
cliState.lastApi === "execFile" (in addition to any existing command/content
assertions) so the test enforces use of execFile rather than exec. Ensure
references: makeChildProcessMock, execStub, execFileStub, cliState.lastCmd, and
the mock.module replacement for node:child_process are updated accordingly.
---
Outside diff comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 316-328: The injection of TOP after SELECT in the SQL transform
mistakenly places TOP before any optional quantifier (DISTINCT/ALL); update the
replacement logic in the block that checks isSelectLike/effectiveLimit (the code
using sql.replace) to capture an optional ALL|DISTINCT token (and following
whitespace) and insert `TOP ${effectiveLimit + 1}` after that token when
present, i.e., change the regex so it matches `SELECT` plus optional
`DISTINCT|ALL` group and use replacement groups to preserve the quantifier
positioning and spacing while inserting TOP in the correct spot.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f981ad72-dfb3-4166-abdc-8bf1fe0f5d42
📒 Files selected for processing (4)
packages/drivers/src/sqlserver.tspackages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/test/altimate/data-diff-cross-dialect.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/opencode/test/altimate/data-diff-cross-dialect.test.ts
- packages/opencode/src/altimate/native/connections/data-diff.ts
| function resolveAzureResourceUrl(config: ConnectionConfig): string { | ||
| const explicit = config.azure_resource_url as string | undefined | ||
| const raw = explicit ?? (() => { | ||
| const host = (config.host as string | undefined) ?? "" | ||
| if (host.includes(".usgovcloudapi.net") || host.includes(".datawarehouse.fabric.microsoft.us")) { | ||
| return "https://database.usgovcloudapi.net/" | ||
| } | ||
| if (host.includes(".chinacloudapi.cn")) { | ||
| return "https://database.chinacloudapi.cn/" | ||
| } | ||
| return "https://database.windows.net/" | ||
| })() | ||
| return raw.endsWith("/") ? raw : `${raw}/` |
There was a problem hiding this comment.
Validate azure_resource_url before calling string methods.
Line 55 casts an unknown config value to string; a non-string value will throw raw.endsWith is not a function. Fail fast with a clear config error instead.
🛡️ Proposed fix
function resolveAzureResourceUrl(config: ConnectionConfig): string {
- const explicit = config.azure_resource_url as string | undefined
+ const explicitValue = config.azure_resource_url
+ if (explicitValue != null && typeof explicitValue !== "string") {
+ throw new Error("azure_resource_url must be a string when provided")
+ }
+ const explicit = explicitValue?.trim() || undefined
const raw = explicit ?? (() => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/drivers/src/sqlserver.ts` around lines 54 - 66, In
resolveAzureResourceUrl, validate that config.azure_resource_url is a string
before calling string methods: if the explicit value exists but is not a string,
throw a clear configuration error; otherwise use the explicit string, or
fallback to the existing host-based detection (using config.host) and then
ensure the returned URL ends with a slash. This prevents raw.endsWith is not a
function by checking typeof config.azure_resource_url === "string" (and
throwing) before using raw.endsWith and preserves current behavior of
resolveAzureResourceUrl.
| const resourceUrl = resolveAzureResourceUrl(config) | ||
| const clientId = (config.azure_client_id as string | undefined) ?? "" | ||
| const cacheKey = `${resourceUrl}|${clientId}` | ||
|
|
||
| const acquireAzureToken = async (): Promise<string> => { | ||
| const cached = tokenCache.get(cacheKey) | ||
| if (cached && cached.expiresAt - Date.now() > TOKEN_REFRESH_MARGIN_MS) { | ||
| return cached.token | ||
| } | ||
|
|
||
| let token: string | undefined | ||
| let expiresAt: number | undefined | ||
| let azureIdentityError: unknown = null | ||
| let azCliStderr = "" | ||
|
|
||
| try { | ||
| const azureIdentity = await import("@azure/identity") | ||
| const credential = new azureIdentity.DefaultAzureCredential( | ||
| config.azure_client_id | ||
| ? { managedIdentityClientId: config.azure_client_id as string } | ||
| : undefined, | ||
| ) | ||
| const tokenResponse = await credential.getToken(`${resourceUrl}.default`) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
@azure/identity DefaultAzureCredential tenantId option Azure CLI az account get-access-token --tenant
💡 Result:
In the @azure/identity JavaScript package, DefaultAzureCredential supports a tenantId option via DefaultAzureCredentialOptions. Set it explicitly when creating the credential to specify the tenant for authentication. Example: const { DefaultAzureCredential } = require("@azure/identity"); const credential = new DefaultAzureCredential({ tenantId: "your-tenant-id" }); This tenantId is passed to underlying credentials like AzureCliCredential, which calls az account get-access-token internally. Recent Azure CLI updates (merged 2025) allow --tenant with the current tenant for Cloud Shell/managed identity accounts, ensuring compatibility when the SDK passes the tenantId. To use a specific tenant with Azure CLI beforehand: - az login --tenant your-tenant-id - Or az account get-access-token --tenant your-tenant-id --resource https://management.azure.com/ (for ARM scope). The tenantId defaults to AZURE_TENANT_ID environment variable if not specified. For multi-tenant scenarios, use additionallyAllowedTenants: ["*"].
Citations:
- 1: https://learn.microsoft.com/en-us/dotnet/api/azure.identity.defaultazurecredentialoptions.tenantid?view=azure-dotnet
- 2: https://azuresdkdocs.z19.web.core.windows.net/javascript/azure-identity/4.10.1/interfaces/DefaultAzureCredentialClientIdOptions.html
- 3: https://learn.microsoft.com/en-us/dotnet/api/azure.identity.defaultazurecredentialoptions?view=azure-dotnet
- 4: https://learn.microsoft.com/en-us/javascript/api/%40azure/identity/defaultazurecredentialoptions?view=azure-node-latest
- 5: https://learn.microsoft.com/en-us/javascript/api/@azure/identity/defaultazurecredential?view=azure-node-latest
- 6: https://azuresdkdocs.z19.web.core.windows.net/javascript/azure-identity/4.10.1/interfaces/AzureCliCredentialOptions.html
- 7: https://azuresdkdocs.z19.web.core.windows.net/javascript/azure-identity/4.10.1/interfaces/DefaultAzureCredentialOptions.html
🏁 Script executed:
# Check if azure_tenant_id is used elsewhere in the config
rg "azure_tenant_id" packages/drivers/src/sqlserver.tsRepository: AltimateAI/altimate-code
Length of output: 163
🏁 Script executed:
# Get the context around lines 151-173 and 196-204 in sqlserver.ts
cat -n packages/drivers/src/sqlserver.ts | sed -n '145,210p'Repository: AltimateAI/altimate-code
Length of output: 3603
🏁 Script executed:
# Check the package.json for `@azure/identity` version
rg "@azure/identity" packages/drivers/package.json -A 1Repository: AltimateAI/altimate-code
Length of output: 50
🏁 Script executed:
# Search for other Azure authentication flows to see how they handle azure_tenant_id
rg -B 3 -A 3 "azure_tenant_id" packages/drivers/src/Repository: AltimateAI/altimate-code
Length of output: 1710
🏁 Script executed:
# Find and check package.json for `@azure/identity` version
find packages/drivers -name "package.json" -exec grep -l "azure" {} \; | head -1 | xargs cat | grep -A 2 -B 2 "azure"Repository: AltimateAI/altimate-code
Length of output: 50
🏁 Script executed:
# Check if there are multiple places in sqlserver.ts that need updating (beyond 196-204)
rg -n "execFileAsync|get-access-token" packages/drivers/src/sqlserver.tsRepository: AltimateAI/altimate-code
Length of output: 338
🏁 Script executed:
# Get context around lines 196-204 to see the second occurrence mentioned
cat -n packages/drivers/src/sqlserver.ts | sed -n '190,210p'Repository: AltimateAI/altimate-code
Length of output: 1137
Honor azure_tenant_id in token acquisition and cache key.
azure_tenant_id is accepted for other Azure AD flows, but the default/CLI token path ignores it and caches only by resource/clientId. Two connections targeting different tenants can reuse the wrong cached token or authenticate against the CLI/default tenant instead of the configured tenant.
🔐 Proposed fix
const resourceUrl = resolveAzureResourceUrl(config)
const clientId = (config.azure_client_id as string | undefined) ?? ""
- const cacheKey = `${resourceUrl}|${clientId}`
+ const tenantId = (config.azure_tenant_id as string | undefined) ?? ""
+ const cacheKey = `${resourceUrl}|${clientId}|${tenantId}`
@@
const azureIdentity = await import("@azure/identity")
- const credential = new azureIdentity.DefaultAzureCredential(
- config.azure_client_id
- ? { managedIdentityClientId: config.azure_client_id as string }
- : undefined,
- )
+ const credential = new azureIdentity.DefaultAzureCredential({
+ ...(clientId ? { managedIdentityClientId: clientId } : {}),
+ ...(tenantId ? { tenantId } : {}),
+ })
@@
[
"account", "get-access-token",
"--resource", resourceUrl,
+ ...(tenantId ? ["--tenant", tenantId] : []),
"--query", "accessToken",
"-o", "tsv",
],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/drivers/src/sqlserver.ts` around lines 151 - 173, The token cache
and acquisition ignore config.azure_tenant_id, causing tenant cross-use; update
the cacheKey in acquireAzureToken to include config.azure_tenant_id (e.g.,
`${resourceUrl}|${clientId}|${tenantId||""}`) and when creating
DefaultAzureCredential (in the acquireAzureToken function) pass the tenantId
option when config.azure_tenant_id is present (and likewise include tenant when
any CLI/alternative credential construction supports a tenant option) so tokens
are requested and cached per tenant; reference resolveAzureResourceUrl,
acquireAzureToken, tokenCache, config.azure_tenant_id, and
DefaultAzureCredential to locate the changes.
| const azureIdentityState = { | ||
| lastScope: "" as string, | ||
| tokenOverride: null as null | { token: string; expiresOnTimestamp?: number }, | ||
| throwOnGetToken: false as boolean, | ||
| } | ||
| mock.module("@azure/identity", () => ({ | ||
| DefaultAzureCredential: class { | ||
| _opts: any | ||
| constructor(opts?: any) { this._opts = opts } | ||
| async getToken(scope: string) { | ||
| azureIdentityState.lastScope = scope | ||
| if (azureIdentityState.throwOnGetToken) throw new Error("mock identity failure") | ||
| if (azureIdentityState.tokenOverride) return azureIdentityState.tokenOverride | ||
| return { token: "mock-azure-token-12345", expiresOnTimestamp: Date.now() + 3600000 } | ||
| } |
There was a problem hiding this comment.
Assert the credential constructor options.
This test would still pass if azure_client_id were ignored, because the mock always returns the same token. Track the DefaultAzureCredential constructor options and assert managedIdentityClientId.
💚 Proposed test fix
const azureIdentityState = {
lastScope: "" as string,
+ lastCredentialOpts: null as any,
tokenOverride: null as null | { token: string; expiresOnTimestamp?: number },
throwOnGetToken: false as boolean,
}
mock.module("@azure/identity", () => ({
DefaultAzureCredential: class {
_opts: any
- constructor(opts?: any) { this._opts = opts }
+ constructor(opts?: any) {
+ this._opts = opts
+ azureIdentityState.lastCredentialOpts = opts ?? null
+ }
@@
test("azure-active-directory-default with client_id passes managedIdentityClientId to credential", async () => {
resetMocks()
+ azureIdentityState.lastCredentialOpts = null
const c = await connect({
host: "myserver.database.windows.net",
database: "db",
@@
// Token is still passed as access-token regardless of client_id
expect(cfg.authentication.type).toBe("azure-active-directory-access-token")
expect(cfg.authentication.options.token).toBe("mock-azure-token-12345")
+ expect(azureIdentityState.lastCredentialOpts).toEqual({
+ managedIdentityClientId: "mi-client-id",
+ })
})Also applies to: 372-385
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 69 - 83, The
test's Azure identity mock always returns a token so it won't catch ignored
azure_client_id; update the DefaultAzureCredential mock (inside
mock.module("@azure/identity")) to save constructor options into
azureIdentityState (e.g., set azureIdentityState.lastCtorOpts = opts in the
constructor) and then add an assertion in the test that
azureIdentityState.lastCtorOpts.managedIdentityClientId equals the expected
azure_client_id; repeat the same change for the other DefaultAzureCredential
mock used later in the file so both mocks validate the managedIdentityClientId
constructor option.
| const cliState = { | ||
| lastCmd: "" as string, | ||
| output: "mock-cli-token-fallback\n" as string, | ||
| throwError: null as null | { stderr?: string; message?: string }, | ||
| } | ||
| const realChildProcess = await import("node:child_process") | ||
| const realUtil = await import("node:util") | ||
|
|
||
| // Helper: build a mock with callback + util.promisify.custom support so | ||
| // `promisify(child_process.exec)` or `promisify(child_process.execFile)` | ||
| // yields { stdout, stderr } exactly like the real implementation. | ||
| function makeChildProcessMock(captureCmd: (args: string) => void) { | ||
| const stub: any = (arg0: any, arg1: any, arg2: any, arg3: any) => { | ||
| // Accept both exec(cmd, opts?, cb?) and execFile(file, args?, opts?, cb?) | ||
| const cb = [arg0, arg1, arg2, arg3].find((x) => typeof x === "function") | ||
| // Pick the best "command" representation for test assertions: | ||
| // - exec: first arg is the full command string | ||
| // - execFile: first arg is the program, second arg is the args array | ||
| if (Array.isArray(arg1)) { | ||
| captureCmd(`${arg0} ${arg1.join(" ")}`) | ||
| } else { | ||
| captureCmd(String(arg0)) | ||
| } | ||
| if (cliState.throwError) { | ||
| const e: any = new Error(cliState.throwError.message ?? "az failed") | ||
| e.stderr = cliState.throwError.stderr | ||
| if (cb) cb(e, "", cliState.throwError.stderr ?? "") | ||
| return { on() {}, stdout: null, stderr: null } | ||
| } | ||
| if (cb) cb(null, cliState.output, "") | ||
| return { on() {}, stdout: null, stderr: null } | ||
| } | ||
| stub[realUtil.promisify.custom] = (arg0: any, arg1: any) => { | ||
| if (Array.isArray(arg1)) { | ||
| captureCmd(`${arg0} ${arg1.join(" ")}`) | ||
| } else { | ||
| captureCmd(String(arg0)) | ||
| } | ||
| if (cliState.throwError) { | ||
| const e: any = new Error(cliState.throwError.message ?? "az failed") | ||
| e.stderr = cliState.throwError.stderr | ||
| return Promise.reject(e) | ||
| } | ||
| return Promise.resolve({ stdout: cliState.output, stderr: "" }) | ||
| } | ||
| return stub | ||
| } | ||
|
|
||
| const execStub = makeChildProcessMock((c) => { cliState.lastCmd = c }) | ||
| const execFileStub = makeChildProcessMock((c) => { cliState.lastCmd = c }) | ||
|
|
||
| mock.module("node:child_process", () => ({ | ||
| ...realChildProcess, | ||
| execSync: (cmd: string) => { | ||
| cliState.lastCmd = cmd | ||
| if (cliState.throwError) { | ||
| const e: any = new Error(cliState.throwError.message ?? "az failed") | ||
| e.stderr = cliState.throwError.stderr | ||
| throw e | ||
| } | ||
| return cliState.output | ||
| }, | ||
| exec: execStub, | ||
| execFile: execFileStub, |
There was a problem hiding this comment.
Assert the CLI fallback uses execFile.
Both exec and execFile currently update cliState.lastCmd, so the test at Line 774 would still pass if the driver regressed back to shell interpolation. Track the invoked child-process API and assert execFile.
🧪 Proposed test hardening
const cliState = {
lastCmd: "" as string,
+ lastMethod: "" as "exec" | "execFile" | "execSync" | "",
output: "mock-cli-token-fallback\n" as string,
throwError: null as null | { stderr?: string; message?: string },
}
@@
-function makeChildProcessMock(captureCmd: (args: string) => void) {
+function makeChildProcessMock(method: "exec" | "execFile", captureCmd: (args: string) => void) {
const stub: any = (arg0: any, arg1: any, arg2: any, arg3: any) => {
+ cliState.lastMethod = method
@@
stub[realUtil.promisify.custom] = (arg0: any, arg1: any) => {
+ cliState.lastMethod = method
@@
-const execStub = makeChildProcessMock((c) => { cliState.lastCmd = c })
-const execFileStub = makeChildProcessMock((c) => { cliState.lastCmd = c })
+const execStub = makeChildProcessMock("exec", (c) => { cliState.lastCmd = c })
+const execFileStub = makeChildProcessMock("execFile", (c) => { cliState.lastCmd = c })
@@
execSync: (cmd: string) => {
+ cliState.lastMethod = "execSync"
cliState.lastCmd = cmd
@@
await c.connect()
+ expect(cliState.lastMethod).toBe("execFile")
expect(cliState.lastCmd).toContain("--resource https://database.usgovcloudapi.net/")
})Also applies to: 764-775
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 88 - 151, The test
currently only records cliState.lastCmd so a regression back to using exec
(shell interpolation) would still pass; modify the mocks to track which
child_process API was invoked (e.g., add cliState.lastApi or similar) by setting
it in the execStub and execFileStub returned from makeChildProcessMock (or
inside makeChildProcessMock based on how it's constructed), then update the
assertion at the test location to assert cliState.lastApi === "execFile" (in
addition to any existing command/content assertions) so the test enforces use of
execFile rather than exec. Ensure references: makeChildProcessMock, execStub,
execFileStub, cliState.lastCmd, and the mock.module replacement for
node:child_process are updated accordingly.
Summary
connect/execute/listSchemas/listTables/describeTable/closewith T-SQLTOPinjection,sys.*catalog queries, and row flattening for unnamed columnsdefault,password,access-token,service-principal-secret,msi-vm,msi-app-service) with shorthand aliases (cli,default,password,service-principal,msi)sqlserver/mssql→tsql,fabric→fabricwithDATETRUNC()andCONVERT(DATE, ..., 23)for date partitioningConnectionPoolisolation for concurrent connections, unnamed-column array flattening, synthetic column name fallbackKey files
packages/drivers/src/sqlserver.ts,packages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/src/altimate/tools/data-diff.ts.opencode/skills/data-parity/SKILL.mdTest plan
bun test packages/drivers/test/sqlserver-unit.test.ts— 28 tests passbun test packages/opencode/test/altimate/data-diff-dialect.test.ts— 9 tests pass🤖 Generated with Claude Code
Summary by cubic
Adds full MSSQL and Microsoft Fabric support to the data-parity skill with Azure AD auth, dialect mapping, and a
mssqlv12 upgrade. Improves cross‑warehouse SQL handling and Azure auth reliability with token caching, sovereign cloud/resource URL support, safer token acquisition, and partitioning fixes.New Features
sqlserver/mssql→tsql,fabric→fabric,postgresql→postgres,mariadb→mysql).ConnectionPoolisolation, Azure AD access‑token cache with proactive refresh, automatic/overrideableazure_resource_url(Gov/China), asyncazCLI fallback;@azure/identityis no longer a peer dependency (bundled viatedious).Bug Fixes
CONVERT(DATE, ..., 23)for T‑SQL/Fabric, escape single quotes, and propagateextra_columnsto recursive runs.joindiffacross different warehouses with a clear error suggestinghashdiff/auto.azure_resource_urland useexecFileforaztoken acquisition.authentication.bun:testleakage; CI flakiness resolved.Written for commit 282876b. Summary will update on new commits.
Summary by CodeRabbit
New Features
Enhancements
Bug Fixes
Documentation
Tests