Skip to content

feat: add MSSQL/Fabric support to data-parity skill#705

Open
suryaiyer95 wants to merge 20 commits intomainfrom
feat/mssql-fabric-data-parity
Open

feat: add MSSQL/Fabric support to data-parity skill#705
suryaiyer95 wants to merge 20 commits intomainfrom
feat/mssql-fabric-data-parity

Conversation

@suryaiyer95
Copy link
Copy Markdown
Contributor

@suryaiyer95 suryaiyer95 commented Apr 13, 2026

Summary

  • SQL Server driver: full connect/execute/listSchemas/listTables/describeTable/close with T-SQL TOP injection, sys.* catalog queries, and row flattening for unnamed columns
  • Azure AD authentication: 7 flows (default, password, access-token, service-principal-secret, msi-vm, msi-app-service) with shorthand aliases (cli, default, password, service-principal, msi)
  • Dialect mapping: sqlserver/mssqltsql, fabricfabric with DATETRUNC() and CONVERT(DATE, ..., 23) for date partitioning
  • mssql v12 upgrade: ConnectionPool isolation for concurrent connections, unnamed-column array flattening, synthetic column name fallback
  • SKILL.md documentation: Fabric connection config, Azure AD auth types, algorithm behavior, schema inspection queries
  • 28 driver unit tests + 9 dialect mapping tests

Key files

Area Files
Driver packages/drivers/src/sqlserver.ts, packages/drivers/test/sqlserver-unit.test.ts
Dialect mapping packages/opencode/src/altimate/native/connections/data-diff.ts
Tool registration packages/opencode/src/altimate/tools/data-diff.ts
Skill docs .opencode/skills/data-parity/SKILL.md

Test plan

  • bun test packages/drivers/test/sqlserver-unit.test.ts — 28 tests pass
  • bun test packages/opencode/test/altimate/data-diff-dialect.test.ts — 9 tests pass
  • Integration test against real MSSQL/Fabric instance
  • Verify Azure AD auth flows with actual Azure credentials

🤖 Generated with Claude Code


Summary by cubic

Adds full MSSQL and Microsoft Fabric support to the data-parity skill with Azure AD auth, dialect mapping, and a mssql v12 upgrade. Improves cross‑warehouse SQL handling and Azure auth reliability with token caching, sovereign cloud/resource URL support, safer token acquisition, and partitioning fixes.

  • New Features

    • Data-parity: dialect-aware identifier quoting and per‑dialect mapping (sqlserver/mssqltsql, fabricfabric, postgresqlpostgres, mariadbmysql).
    • SQL Server driver: ConnectionPool isolation, Azure AD access‑token cache with proactive refresh, automatic/overrideable azure_resource_url (Gov/China), async az CLI fallback; @azure/identity is no longer a peer dependency (bundled via tedious).
  • Bug Fixes

    • Cross-warehouse SQL‑query mode: inject side‑specific CTEs per task to keep each engine from parsing the other side’s CTE.
    • Cross‑dialect partitioned diffs: build per‑side partition WHERE clauses, wrap as subqueries, normalize dates to ISO + CONVERT(DATE, ..., 23) for T‑SQL/Fabric, escape single quotes, and propagate extra_columns to recursive runs.
    • Guardrails: disallow joindiff across different warehouses with a clear error suggesting hashdiff/auto.
    • MySQL/MariaDB partitions: stop ISO date rewrite for week partitions; non‑T‑SQL dialects now use raw values.
    • Azure auth hardening: normalize trailing slash on azure_resource_url and use execFile for az token acquisition.
    • Driver robustness: preserve named headers when flattening unnamed aggregate columns; fail fast on unknown Azure AD subtypes and guard non‑string authentication.
    • Diff output: show "(no values)" when a diff row has an empty values array.
    • Test reliability: export SQL‑builder helpers and remove global mocks to prevent bun:test leakage; CI flakiness resolved.

Written for commit 282876b. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Microsoft Fabric recognized as a connection type; Fabric and SQL Server share configuration aliases and dialect mapping.
  • Enhancements

    • Expanded Azure AD authentication (access-token, service-principal, MSI) with token caching, fallback, and enforced encryption for AD flows.
    • Dialect-aware identifier quoting and date/partition SQL generation for cross-dialect diffs.
  • Bug Fixes

    • Improved query result shaping, truncation reporting, and handling of unnamed/mixed columns.
  • Documentation

    • Added Fabric/SQL Server connection guidance and schema inspection examples.
  • Tests

    • Extensive tests for auth, Fabric, dialects, CTE injection, and diff logic.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Microsoft Fabric support and dialect-aware data-diff logic, extends the SQL Server driver with Azure AD token acquisition (cached) plus az CLI fallback, normalizes Fabric connection fields, bumps optional mssql to ^12.0.0, and includes tests and documentation updates.

Changes

Cohort / File(s) Summary
Driver deps & normalization
packages/drivers/package.json, packages/drivers/src/normalize.ts
Bumped optional mssql to ^12.0.0; expanded SQL Server–family alias map to normalize authentication, azure_*, access_token, and azure_resource_url; applied aliases to fabric.
SQL Server / Fabric driver + tests
packages/drivers/src/sqlserver.ts, packages/drivers/test/sqlserver-unit.test.ts
Added module token cache with expiry parsing, Azure resource-url resolution, token acquisition via @azure/identity with az CLI fallback, auth-subtype normalization, connection-pool instantiation, result truncation/flattening logic, exported test helper to clear cache, and comprehensive unit tests covering auth flows, caching, SQL rewriting, truncation, and schema introspection.
Registry wiring & registry test
packages/opencode/src/altimate/native/connections/registry.ts, packages/opencode/test/altimate/connections.test.ts
Registered fabric@altimateai/drivers/sqlserver, added fabric to password-validated driver types, and added a registry test verifying Fabric configs are recognized.
Data-diff core & tests
packages/opencode/src/altimate/native/connections/data-diff.ts, packages/opencode/test/altimate/data-diff-cte.test.ts, packages/opencode/test/altimate/data-diff-dialect.test.ts, packages/opencode/test/altimate/data-diff-cross-dialect.test.ts
Added WAREHOUSE_TO_DIALECT + warehouseTypeToDialect(), stricter SQL-query detection, side-specific CTE prefix resolution, dialect-aware identifier quoting, dialect-specific date_trunc/partition WHERE generation (T-SQL/Fabric special-casing), per-partition subquery wrapping, cross-warehouse joindiff guard, and corresponding unit tests.
Formatting / UX tweak
packages/opencode/src/altimate/tools/data-diff.ts
Render "(no values)" when sample-differences d.values is an empty array.
Normalize-config test
packages/opencode/test/altimate/driver-normalize.test.ts
Added test validating fabric aliasing (serverhost, trustServerCertificatetrust_server_certificate) and expected field removals.
Documentation
.opencode/skills/data-parity/SKILL.md
Added SQL Server / Fabric schema-introspection SQL snippet and Fabric-specific notes (DATETRUNC requirement, supported warehouses/auth mappings, connection params, and algorithm↔dialect mapping).

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Driver as SQL Server Driver
    participant AzureId as `@azure/identity`
    participant CLI as az CLI
    participant AzureAD as Azure AD
    participant MsSQL as mssql Library

    App->>Driver: connect(config with Azure AD auth)
    Driver->>Driver: normalize auth type & resolve resource URL
    alt azure identity available
        Driver->>AzureId: getToken(scope)
        AzureId->>AzureAD: request token
        AzureAD-->>AzureId: return token
        AzureId-->>Driver: access token (cached)
    else fallback to az CLI
        Driver->>CLI: exec("az account get-access-token --resource <res>")
        CLI->>AzureAD: request token
        AzureAD-->>CLI: return token
        CLI-->>Driver: token string (cached)
    end
    Driver->>MsSQL: new ConnectionPool(config with token)
    MsSQL-->>Driver: connected pool
    Driver-->>App: connected instance
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • mdesmet
  • kulvirgit

Poem

🐇 I hopped through configs, tokens in tow,
I fetched a scope where Fabrics grow,
I swapped dialects, wrapped tables neat,
Cached tokens warm upon my feet,
A joyful rabbit cheers this code—hooray! 🎉

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is missing the required 'PINEAPPLE' keyword at the top for AI-generated contributions, as mandated by the description template. Add the word 'PINEAPPLE' at the very top of the PR description before any other content, as required for all AI-generated contributions.
Docstring Coverage ⚠️ Warning Docstring coverage is 57.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding MSSQL/Fabric support to the data-parity skill, which aligns with the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/mssql-fabric-data-parity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 30 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:266">
P2: Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</violation>

<violation number="2" location="packages/opencode/src/altimate/native/connections/data-diff.ts:489">
P2: `partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</violation>
</file>

<file name="packages/opencode/src/altimate/native/connections/registry.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/registry.ts:125">
P2: `fabric` is missing from the `PASSWORD_DRIVERS` set. Since it maps to the same sqlserver driver as `sqlserver`/`mssql` (both present in the set), `fabric` should also be included to get the same friendly error when `password` is not a string.</violation>
</file>

<file name="packages/opencode/src/altimate/tools/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/data-diff.ts:206">
P2: When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</violation>
</file>

<file name="packages/drivers/src/sqlserver.ts">

<violation number="1" location="packages/drivers/src/sqlserver.ts:148">
P1: `flattenRow` flattens all array values, but only the empty-string key (`""`) holds mssql's merged unnamed columns. A legitimate array column value (e.g. from JSON aggregation) will be incorrectly spread, corrupting the row data and misaligning columns. Restrict flattening to the `""` key only.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts Outdated
Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts
Comment thread packages/opencode/src/altimate/native/connections/registry.ts
Comment thread packages/opencode/src/altimate/tools/data-diff.ts Outdated
@suryaiyer95 suryaiyer95 force-pushed the feat/mssql-fabric-data-parity branch from 403477e to 811c2be Compare April 13, 2026 17:36
@anandgupta42
Copy link
Copy Markdown
Contributor

@suryaiyer95 can you address code review comments?

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/drivers/src/sqlserver.ts">

<violation number="1" location="packages/drivers/src/sqlserver.ts:90">
P2: Avoid `execSync` in the async connection path; it can block the event loop for up to the CLI timeout.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/drivers/src/sqlserver.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:819">
P2: `crossWarehouse` is computed from unresolved params, so an omitted `source_warehouse` can be misclassified as cross-warehouse when both sides actually run on the same default warehouse.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts Outdated
@suryaiyer95 suryaiyer95 changed the base branch from feat/data-parity-skill-improvements to main April 17, 2026 05:22
suryaiyer95 and others added 15 commits April 16, 2026 22:26
Bun's runtime never fires native addon async callbacks, so the async
`new duckdb.Database(path, opts, callback)` form would hit the 2-second
timeout fallback on every connection attempt.

Switch to the synchronous constructor form `new duckdb.Database(path)` /
`new duckdb.Database(path, opts)` which throws on error and completes
immediately in both Node and bun runtimes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The async callback form with 2s fallback was already working correctly
at e3df5a4. The timeout was caused by a missing duckdb .node binary,
not a bun incompatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `warehouseTypeToDialect()` mapping: sqlserver→tsql, mssql→tsql,
  fabric→fabric, postgresql→postgres, mariadb→mysql. Fixes critical
  serde mismatch where Rust engine rejects raw warehouse type names.
- Update both `resolveDialect()` functions to use the mapping
- Add MSSQL/Fabric cases to `dateTruncExpr()` — DATETRUNC(DAY, col)
- Add locale-safe date literal casting via CONVERT(DATE, ..., 23)
- Register `fabric` in DRIVER_MAP (reuses sqlserver TDS driver)
- Add `fabric` normalize aliases in normalize.ts
- Add 15 SQL Server driver unit tests (TOP injection, truncation,
  schema introspection, connection lifecycle, result format)
- Add 9 dialect mapping unit tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Support all 7 Azure AD / Entra ID auth types in `sqlserver.ts`:
  `azure-active-directory-password`, `access-token`, `service-principal-secret`,
  `msi-vm`, `msi-app-service`, `azure-active-directory-default`, `token-credential`
- Force TLS encryption for all Azure AD connections
- Dynamic import of `@azure/identity` for `DefaultAzureCredential`
- Add normalize aliases for Azure AD config fields (`authentication`,
  `azure_tenant_id`, `azure_client_id`, `azure_client_secret`, `access_token`)
- Add `fabric: SQLSERVER_ALIASES` to DRIVER_ALIASES
- Add 10 Azure AD unit tests covering all auth flows, encryption,
  and `DefaultAzureCredential` with managed identity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LL.md

- Add SQL Server / Fabric schema inspection query in Step 2
- Add "SQL Server and Microsoft Fabric" section with:
  - Supported configurations table (sqlserver, mssql, fabric)
  - Fabric connection guide with Azure AD auth types
  - Algorithm behavior notes (joindiff vs hashdiff selection)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rscore column filter

- **Azure AD auth**: Pass `azure-active-directory-*` types directly to tedious
  instead of constructing `DefaultAzureCredential` ourselves. Tedious imports
  `@azure/identity` internally and creates credentials — avoids bun CJS/ESM
  `isTokenCredential` boundary issue that caused "not an instance of the token
  credential class" errors.
- **Auth shorthands**: Map `CLI`, `default`, `password`, `service-principal`,
  `msi`, `managed-identity` to their full tedious type names.
- **Column filter**: Remove `_.startsWith("_")` filter from `execute()` result
  columns — it stripped legitimate aliases like `_p` used by partition discovery,
  causing partitioned diffs to return empty results.
- **Tests**: Remove `@azure/identity` mock (no longer imported by driver),
  update auth assertions, add shorthand mapping tests, fix column filter test.
- **Verified**: All 97 driver tests pass. Full data-diff pipeline tested against
  real MSSQL server (profile, joindiff, auto, where_clause, partitioned).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lattening

- Upgrade `mssql` from v11 to v12 (`tedious` 18 → 19)
- Use explicit `ConnectionPool` instead of global `mssql.connect()` to
  isolate multiple simultaneous connections
- Flatten unnamed column arrays — `mssql` merges unnamed columns (e.g.
  `SELECT COUNT(*), SUM(...)`) into a single array under the empty-string
  key; restore positional column values
- Proper column name resolution: compare `namedKeys.length` against
  flattened row length, fall back to synthetic `col_0`, `col_1`, etc.
- Update test mock to export `ConnectionPool` class and `createMockPool`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions

Use ternary expressions (`x ? {...} : {}`) instead of short-circuit
(`x && {...}`) to avoid spreading a boolean value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- P1: restrict `flattenRow` to only spread the empty-string key (`""`)
  where mssql merges unnamed columns, preserving legitimate array values
- P2: escape single quotes in `partitionValue` for date-mode branches in
  `buildPartitionWhereClause` (categorical mode already escaped)
- P2: add `fabric` to `PASSWORD_DRIVERS` set in registry for consistent
  password validation alongside `sqlserver`/`mssql`
- P2: fallback to `"(no values)"` when `d.values` is nullish to prevent
  template literal coercing `undefined` to the string `"undefined"`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- sqlserver-unit: 3 tests for unnamed column flattening — verifies only
  the empty-string key is spread, legitimate named arrays are preserved
- driver-normalize: fabric type uses SQLSERVER_ALIASES (server → host,
  trustServerCertificate → trust_server_certificate)
- connections: fabric type is recognized in DRIVER_MAP and listed correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Minimum Version Requirements" table to SKILL.md covering SQL Server
  2022+, mssql v12, and @azure/identity v4 with rationale for each
- Document auth shorthands (CLI, default, password, service-principal, msi)
- Move @azure/identity from dependencies to optional peerDependencies so
  it is NOT installed by default — only required for Azure AD auth
- Add runtime check in sqlserver driver: if Azure AD auth type is requested
  but @azure/identity is missing, throw a clear install instruction error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…solution

- For `azure-active-directory-default` (CLI/default auth), acquire token
  ourselves instead of delegating to tedious's internal `@azure/identity`
- Strategy: try `DefaultAzureCredential` first, fall back to `az` CLI subprocess
- Bypasses Bun resolving `@azure/identity` to browser bundle where
  `DefaultAzureCredential` is a non-functional stub
- Also bypasses CJS/ESM `isTokenCredential` boundary mismatch
- All 31 driver unit tests pass, verified against real Fabric endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oken` when none supplied

The `azure-active-directory-access-token` branch passed
`token: config.token ?? config.access_token` to tedious. When neither
field was set on a connection (e.g. a `fabric-migration` entry that
declared the auth type but no token), tedious threw:

    TypeError: The "config.authentication.options.token" property
    must be of type string

This blocked any Fabric/MSSQL config that relied on ambient credentials
(Azure CLI / managed identity) but used the explicit
`azure-active-directory-access-token` type instead of the `default`
shorthand.

Refactor token acquisition (`DefaultAzureCredential` → `az` CLI
fallback) into a shared `acquireAzureToken()` helper used by both the
`default` path and the `access-token` path when no token was supplied.
Callers that pass an explicit token are unchanged.

Also harden `mock.module("node:child_process", ...)` in
`sqlserver-unit.test.ts` to spread the real module so sibling tests in
the same `bun test` run keep access to `spawn` / `exec` / `fork`.

Tests: 110 pass, 0 fail in `packages/drivers`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ry mode

When `source` and `target` are both SQL queries, `resolveTableSources` wraps
them as `__diff_source` / `__diff_target` CTEs and the executor prepends the
combined `WITH …` block to every engine-emitted task. T-SQL and Fabric
parse-bind every CTE body even when unreferenced, so a task routed to the
source warehouse failed to resolve the target-only base table referenced
inside the unused `__diff_target` CTE (and vice versa), producing
`Invalid object name` errors from the wrong warehouse.

Return side-specific prefixes from `resolveTableSources` alongside the
combined one, and have the executor loop in `runDataDiff` pick the source
or target prefix per task when `source_warehouse !== target_warehouse`.
Same-warehouse behaviour is unchanged.

Adds `data-diff-cte.test.ts` covering plain-name passthrough, both-query
wrapping, side-specific CTE isolation, and CTE merging with engine-emitted
`WITH` clauses (10 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 333a45c moved `@azure/identity` from `optionalDependencies` to
`peerDependencies` with `optional: true` in `packages/drivers/package.json`,
but the lockfile was not regenerated. That left CI under `--frozen-lockfile`
broken and made fresh installs silently diverge from the committed state.

Running `bun install` brings the lockfile in sync: `@azure/identity` is
recorded as an optional peer, and its transitive pins (`@azure/msal-browser`,
`@azure/msal-common`, `@azure/msal-node`) re-resolve to the versions required
by `tedious` and `snowflake-sdk`, matching the reachable runtime surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@suryaiyer95 suryaiyer95 force-pushed the feat/mssql-fabric-data-parity branch from 52fe09e to 1977232 Compare April 17, 2026 05:26
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
packages/opencode/src/altimate/native/connections/data-diff.ts (1)

815-820: Cross-warehouse detection is a purely textual compare.

sourceWarehouse !== targetWarehouse treats undefined vs an explicit name as "cross-warehouse" even when the explicit name is the default (first) warehouse. The side-specific CTE injection is a strict superset of the combined prefix, so correctness isn't affected — just a minor inefficiency (slightly narrower CTE sent). Not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/connections/data-diff.ts` around lines
815 - 820, Summary: crossWarehouse currently does a raw textual compare
(sourceWarehouse !== targetWarehouse) which can misclassify undefined vs an
explicit default warehouse name as cross-warehouse. Fix: canonicalize both sides
before comparing by resolving undefined/empty values to the effective/default
warehouse name used elsewhere (use the same resolver used for
connection/effective settings) instead of comparing raw params.source_warehouse
and params.target_warehouse; update the variables referenced (sourceWarehouse,
targetWarehouse, crossWarehouse) so crossWarehouse = resolvedSourceWarehouse !==
resolvedTargetWarehouse. Ensure you call the existing resolution helper (or add
one) rather than inventing a new default string.
packages/drivers/test/sqlserver-unit.test.ts (1)

483-489: Assertion gap lets the "mixed named + unnamed" column-naming bug slip through.

This test verifies rows for { name: "alice", "": [42] } but doesn't assert result.columns. Under the current implementation (sqlserver.ts Lines 210-217), mixing a named column with multiple unnamed values (e.g. { name: "alice", "": [42, 100] }) collapses the header to ["col_0", "col_1", "col_2"] and drops name. Adding a case with ≥ 2 unnamed values and asserting columns would have caught that and will guard whichever fix lands.

♻️ Suggested additional case
     test("handles mix of named and unnamed columns", async () => {
       mockQueryResult = {
         recordset: [{ name: "alice", "": [42] }],
       }
       const result = await connector.execute("SELECT * FROM t")
       expect(result.rows).toEqual([["alice", 42]])
     })
+
+    test("preserves named-column header when mixed with multiple unnamed columns", async () => {
+      // SELECT name, COUNT(*), SUM(x) FROM t → { name: "alice", "": [42, 100] }
+      mockQueryResult = {
+        recordset: [{ name: "alice", "": [42, 100] }],
+      }
+      const result = await connector.execute("SELECT name, COUNT(*), SUM(x) FROM t")
+      expect(result.rows).toEqual([["alice", 42, 100]])
+      expect(result.columns[0]).toBe("name")
+    })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 483 - 489, Add a
new test variant in packages/drivers/test/sqlserver-unit.test.ts that uses a
recordset like { name: "alice", "": [42, 100] } (i.e., multiple unnamed values)
and assert both result.rows and result.columns so the header preserves the named
column "name" and includes generated column names for the unnamed values (e.g.,
["name","col_1","col_2"]); this will surface the bug referenced around
sqlserver.ts (logic near lines handling column mapping) and prevent regressions
in connector.execute's column assembly.
packages/opencode/test/altimate/data-diff-dialect.test.ts (1)

51-54: Consider expanding case-insensitivity coverage.

The case-insensitivity block only exercises sqlserver and postgresql. Since mssql, fabric, and mariadb go through the same normalization path, a one-line additional assertion (e.g., MSSQLtsql, FABRICfabric) would guard against a future regression where only some branches get lowercased.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/altimate/data-diff-dialect.test.ts` around lines 51 -
54, Add assertions to the "handles uppercase input" test to cover other inputs
that share the normalization path so case-insensitivity can't regress: update
the test block in packages/opencode/test/altimate/data-diff-dialect.test.ts that
calls warehouseTypeToDialect to also assert uppercase variants such as "MSSQL"
-> "tsql", "FABRIC" -> "fabric" and "MARIADB" -> "mariadb" (or whichever
expected dialects map in warehouseTypeToDialect) so all branches that are
lowercased are exercised.
packages/drivers/src/sqlserver.ts (1)

89-100: execSync blocks the event loop during connect().

execSync("az account get-access-token ...") with a 15 s timeout runs synchronously on the main thread inside an async function — any other pending I/O on the runtime is stalled while Azure CLI spins up (typically 0.5–2 s, occasionally much longer). Prefer the async execFile (or promisify(exec)) so the token fetch cooperates with the event loop.

♻️ Suggested change
-            try {
-              const { execSync } = await import("node:child_process")
-              const out = execSync(
-                "az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv",
-                { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] },
-              ).trim()
-              if (out) token = out
-            } catch {
-              // az CLI not installed or not logged in
-            }
+            try {
+              const { execFile } = await import("node:child_process")
+              const { promisify } = await import("node:util")
+              const run = promisify(execFile)
+              const { stdout } = await run(
+                "az",
+                ["account", "get-access-token", "--resource", "https://database.windows.net/", "--query", "accessToken", "-o", "tsv"],
+                { encoding: "utf-8", timeout: 15000 },
+              )
+              const out = stdout.trim()
+              if (out) token = out
+            } catch {
+              // az CLI not installed or not logged in
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 89 - 100, The current token
fetch inside connect() uses sync child process invocation (execSync) which
blocks the event loop; replace it with an asynchronous call (e.g., use
child_process.execFile or promisified exec via util.promisify) so the Azure CLI
call runs without blocking. Locate the block that imports execSync and sets
token (the token variable and the try/catch that calls "az account
get-access-token ...") and change it to call the async API (await the
execFile/promisified exec) with the same args and timeout, handle stdout.trim()
to set token, and preserve the existing catch behavior for missing CLI or auth
failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.opencode/skills/data-parity/SKILL.md:
- Around line 453-458: The fenced code block showing the pseudo-YAML config (the
block containing type: "fabric", host:
"<workspace-id>-<item-id>.datawarehouse.fabric.microsoft.com", database:
"<warehouse-name>", authentication: "azure-active-directory-default") is missing
a language tag; change the opening fence from ``` to ```yaml (or ```text) so
markdownlint MD040 is satisfied and the block is treated as YAML.

In `@packages/drivers/package.json`:
- Around line 20-33: Remove the redundant optional peer dependency entry for
"@azure/identity" from package.json: delete the "@azure/identity" key inside the
"peerDependencies" object and its corresponding entry in "peerDependenciesMeta"
since "mssql"/"tedious" already bundles it; update the package.json so only the
other dependencies (e.g., "mssql", "oracledb", "duckdb", "mongodb",
"@clickhouse/client") remain and no "@azure/identity" references exist under
"peerDependencies" or "peerDependenciesMeta".

In `@packages/drivers/src/sqlserver.ts`:
- Around line 53-54: The code calls rawAuth.toLowerCase() without ensuring
rawAuth is a string, causing a TypeError for non-string authentication values;
update the computation of authType in the sqlserver driver to first check typeof
rawAuth === "string" before using toLowerCase and looking up AUTH_SHORTHANDS,
e.g. only apply AUTH_SHORTHANDS[rawAuth.toLowerCase()] when rawAuth is a string,
otherwise pass rawAuth through (or leave undefined) so non-string auth blocks
(config.authentication) don't blow up at runtime; adjust the logic around
rawAuth, authType, and AUTH_SHORTHANDS to be type-safe and explicit.
- Around line 56-158: The code path that handles authType starting with
"azure-active-directory" (see authType variable and mssqlConfig.authentication
assignments) does not handle unknown subtypes and can leave mssqlConfig without
authentication; add a terminal else branch after the existing
azure-active-directory-* branches that throws a clear Error (e.g. mentioning the
unsupported authType) so unknown or misspelled Azure AD subtypes fail fast;
ensure the thrown error references authType so callers get an explicit message
instead of a later opaque tedious error.
- Around line 76-87: The empty catch around the `@azure/identity` import and
credential.getToken is hiding real authentication errors; change the catch to
capture the thrown error (e.g., catch (err) { azureIdentityErr = err }) and
preserve that error in a local variable, then either append its message/details
to the final thrown error or log it at debug level when token remains undefined;
update the block that currently throws the generic "install `@azure/identity` or
run az login" message to include the captured azureIdentityErr (or its
message/stack) so real AAD/authentication failures are visible while still
tolerating missing-module/browser-bundle cases.
- Around line 208-217: The current columns logic replaces known named headers
when sampleFlat is longer (mixed named + unnamed aggregates); change the
construction in the columns branch to preserve the existing named prefix
(namedKeys) and only synthesize column names for the extra flattened entries
(e.g., generate col_i for indices >= namedKeys.length up to
sampleFlat.length-1), falling back to result.recordset.columns only when both
namedKeys and sampleFlat are empty; update the conditional around
sampleFlat/namedKeys where columns is assigned (the expression that uses
sampleFlat, namedKeys, recordset, and result.recordset.columns) so known headers
are retained and only the tail is renamed.
- Around line 162-167: Remove the fallback to the global shared pool to avoid
reintroducing cross-database isolation; in the block that currently checks
MssqlConnectionPool, ensure that if MssqlConnectionPool is falsy you throw an
explicit error instead of calling mssql.connect, i.e., keep the branch that
constructs and connects a new MssqlConnectionPool(mssqlConfig) and replace the
else branch with a throw that references MssqlConnectionPool and mssql.connect
in the message so reviewers can quickly locate and understand the change.

In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 154-159: The test "empty result returns correctly" uses a fake
mssql response shape by setting mockQueryResult.recordset_columns, but the
driver reads result.recordset?.columns; update the test to match real mssql
shape by removing recordset_columns and instead set mockQueryResult.recordset =
[] and, if you need to include column metadata, set
mockQueryResult.recordset.columns = {} (or leave absent for empty results) so
the driver code that checks result.recordset?.columns and the code paths in
connector.execute behave against the real response shape.

---

Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 89-100: The current token fetch inside connect() uses sync child
process invocation (execSync) which blocks the event loop; replace it with an
asynchronous call (e.g., use child_process.execFile or promisified exec via
util.promisify) so the Azure CLI call runs without blocking. Locate the block
that imports execSync and sets token (the token variable and the try/catch that
calls "az account get-access-token ...") and change it to call the async API
(await the execFile/promisified exec) with the same args and timeout, handle
stdout.trim() to set token, and preserve the existing catch behavior for missing
CLI or auth failures.

In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 483-489: Add a new test variant in
packages/drivers/test/sqlserver-unit.test.ts that uses a recordset like { name:
"alice", "": [42, 100] } (i.e., multiple unnamed values) and assert both
result.rows and result.columns so the header preserves the named column "name"
and includes generated column names for the unnamed values (e.g.,
["name","col_1","col_2"]); this will surface the bug referenced around
sqlserver.ts (logic near lines handling column mapping) and prevent regressions
in connector.execute's column assembly.

In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 815-820: Summary: crossWarehouse currently does a raw textual
compare (sourceWarehouse !== targetWarehouse) which can misclassify undefined vs
an explicit default warehouse name as cross-warehouse. Fix: canonicalize both
sides before comparing by resolving undefined/empty values to the
effective/default warehouse name used elsewhere (use the same resolver used for
connection/effective settings) instead of comparing raw params.source_warehouse
and params.target_warehouse; update the variables referenced (sourceWarehouse,
targetWarehouse, crossWarehouse) so crossWarehouse = resolvedSourceWarehouse !==
resolvedTargetWarehouse. Ensure you call the existing resolution helper (or add
one) rather than inventing a new default string.

In `@packages/opencode/test/altimate/data-diff-dialect.test.ts`:
- Around line 51-54: Add assertions to the "handles uppercase input" test to
cover other inputs that share the normalization path so case-insensitivity can't
regress: update the test block in
packages/opencode/test/altimate/data-diff-dialect.test.ts that calls
warehouseTypeToDialect to also assert uppercase variants such as "MSSQL" ->
"tsql", "FABRIC" -> "fabric" and "MARIADB" -> "mariadb" (or whichever expected
dialects map in warehouseTypeToDialect) so all branches that are lowercased are
exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6589a3e0-5c1a-4ac6-bd8a-e405334413ca

📥 Commits

Reviewing files that changed from the base of the PR and between c438bb0 and 1977232.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • .opencode/skills/data-parity/SKILL.md
  • packages/drivers/package.json
  • packages/drivers/src/normalize.ts
  • packages/drivers/src/sqlserver.ts
  • packages/drivers/test/sqlserver-unit.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts
  • packages/opencode/src/altimate/native/connections/registry.ts
  • packages/opencode/src/altimate/tools/data-diff.ts
  • packages/opencode/test/altimate/connections.test.ts
  • packages/opencode/test/altimate/data-diff-cte.test.ts
  • packages/opencode/test/altimate/data-diff-dialect.test.ts
  • packages/opencode/test/altimate/driver-normalize.test.ts

Comment thread .opencode/skills/data-parity/SKILL.md Outdated
Comment thread packages/drivers/package.json
Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/drivers/src/sqlserver.ts
Comment thread packages/drivers/src/sqlserver.ts
Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/drivers/test/sqlserver-unit.test.ts
Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts
Fixes five correctness, reliability, and portability issues surfaced by the
consensus code review of this branch.

CRITICAL #1 — Cross-dialect partitioned diff (`data-diff.ts`):
`runPartitionedDiff` built one partition WHERE clause with `sourceDialect`
and passed it as shared `where_clause` to the recursive `runDataDiff`, which
applied it to both warehouses identically. Cross-dialect partition mode
(MSSQL → Postgres) failed because the target received T-SQL
`DATETRUNC`/`CONVERT(DATE, …, 23)`. Now builds per-side WHERE using each
warehouse's dialect and bakes it into dialect-quoted subquery SQL for source
and target independently. The existing side-aware CTE injection handles the
rest.

MAJOR #2 — Azure AD token caching and refresh (`sqlserver.ts`):
`acquireAzureToken` fetched a fresh token on every `connect()` and embedded
it in the pool config with no refresh. Long-lived sessions silently failed
when the ~1h token expired. Adds a module-scoped cache keyed by
`(resource, client_id)` with proactive refresh 5 min before expiry, parsing
`expiresOnTimestamp` from `@azure/identity` or the JWT `exp` claim from the
`az` CLI fallback. Exposes `_resetTokenCacheForTests` for isolation.

MAJOR #3 — `joindiff` + cross-warehouse guard (`data-diff.ts`):
Explicit `algorithm: "joindiff"` combined with different warehouses produced
broken SQL (one task referencing two CTE aliases with only one injected).
Now returns an early error with a clear message steering users to `hashdiff`
or `auto`. Cross-warehouse detection switched from warehouse-name string
compare to dialect compare, matching the underlying SQL-divergence invariant.

MAJOR #4 — Dialect-aware identifier quoting in CTE wrapping (`data-diff.ts`):
`resolveTableSources` wrapped plain-table names with ANSI double-quotes for
all dialects. T-SQL/Fabric require `QUOTED_IDENTIFIER ON` for this to work;
default for `mssql`/tedious is ON, but user contexts (stored procs, legacy
collations) can override. Now accepts source/target dialect parameters and
delegates to `quoteIdentForDialect`, which was hoisted to module scope so
it can be reused across partition and CTE paths.

MAJOR #5 — Configurable Azure resource URL (`sqlserver.ts`, `normalize.ts`):
Token acquisition hardcoded `https://database.windows.net/`, blocking Azure
Government, Azure China, and sovereign-cloud customers. Now honours an
explicit `azure_resource_url` config field and otherwise infers the URL
from the host suffix (`.usgovcloudapi.net`, `.chinacloudapi.cn`). Adds the
usual camelCase/snake_case aliases in the SQL Server normalizer.

Also surfaces Azure auth error causes: if both `@azure/identity` and `az`
CLI fail, the thrown error includes both hints (redacted) so users know why
rather than seeing the generic "install @azure/identity or run az login"
message.

Tests: adds `data-diff-cross-dialect.test.ts` covering the cross-dialect
partition WHERE routing and the `joindiff` guard; extends
`data-diff-cte.test.ts` with dialect-aware quoting assertions for tsql,
fabric, and mysql; extends `sqlserver-unit.test.ts` with cache hit / expiry
refresh / client-id keyed cache tests, commercial/gov/china/custom resource
URL resolution, and the combined-error-hints surface.

All 41 sqlserver driver tests, 24 data-diff orchestrator tests, and 214
normalize/connections tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 17, 2026 17:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class SQL Server (MSSQL) and Microsoft Fabric support to the data-parity/data-diff workflow, including dialect mapping to Rust engine dialect names, cross-warehouse-safe CTE injection, and expanded driver authentication/normalization/docs.

Changes:

  • Add SQL Server driver capabilities (Azure AD auth flows, TOP injection, schema introspection, result shaping) and upgrade mssql to v12.
  • Add warehouse-type → Rust dialect mapping (sqlserver/mssqltsql, fabricfabric, etc.) plus dialect-aware quoting and side-specific CTE injection for SQL-query mode and partitioned diffs.
  • Extend registry/normalization/tests/docs to recognize fabric as a supported connection type routed through the SQL Server driver.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/opencode/test/altimate/driver-normalize.test.ts Adds coverage ensuring fabric configs normalize using SQL Server aliasing.
packages/opencode/test/altimate/data-diff-dialect.test.ts Adds tests for warehouse-type → Rust dialect mapping.
packages/opencode/test/altimate/data-diff-cte.test.ts Adds tests for query detection, dialect-aware quoting, and side-specific CTE injection.
packages/opencode/test/altimate/data-diff-cross-dialect.test.ts Adds tests for cross-dialect partition filtering and joindiff cross-warehouse guard behavior.
packages/opencode/test/altimate/connections.test.ts Adds coverage for fabric being recognized/registered as a connection type.
packages/opencode/src/altimate/tools/data-diff.ts Adjusts diff output formatting for missing diff row values.
packages/opencode/src/altimate/native/connections/registry.ts Adds fabric → sqlserver driver routing.
packages/opencode/src/altimate/native/connections/data-diff.ts Implements dialect mapping, dialect-aware quoting, side-specific CTE injection, and cross-dialect partition handling.
packages/drivers/test/sqlserver-unit.test.ts Adds extensive unit tests for the SQL Server driver, auth flows, and result shaping.
packages/drivers/src/sqlserver.ts Implements SQL Server driver with Azure AD token acquisition/caching, ConnectionPool isolation, and result shaping.
packages/drivers/src/normalize.ts Adds SQL Server/Fabric config alias normalization for auth and Azure fields.
packages/drivers/package.json Upgrades mssql dependency and adds optional peer dependency on @azure/identity.
bun.lock Updates lockfile for mssql/tedious upgrade and peer dependency metadata.
.opencode/skills/data-parity/SKILL.md Documents SQL Server/Fabric configuration, requirements, and behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/opencode/src/altimate/tools/data-diff.ts Outdated
Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
packages/drivers/src/sqlserver.ts (2)

255-262: ⚠️ Potential issue | 🟡 Minor

Fallback to mssql.connect() still defeats per-pool isolation.

This was raised in a prior review and remains unresolved. Since packages/drivers/package.json now pins mssql@^12.0.0 where ConnectionPool is a stable named export, MssqlConnectionPool should never be falsy in practice; if it somehow is, silently using the global shared pool re-introduces the cross-database interference bug the explicit ConnectionPool path was added to fix. Throw instead.

🔒️ Proposed fix
-      if (MssqlConnectionPool) {
-        pool = new MssqlConnectionPool(mssqlConfig)
-        await pool.connect()
-      } else {
-        pool = await mssql.connect(mssqlConfig)
-      }
+      if (!MssqlConnectionPool) {
+        throw new Error(
+          "mssql.ConnectionPool is not available — the installed `mssql` package is too old. Upgrade to mssql@^12.",
+        )
+      }
+      pool = new MssqlConnectionPool(mssqlConfig)
+      await pool.connect()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 255 - 262, The fallback
branch that calls mssql.connect() reintroduces the global shared pool bug;
replace the fallback with a hard failure instead of silently using the global
pool: when MssqlConnectionPool is falsy, throw a descriptive error indicating
ConnectionPool is required (referencing MssqlConnectionPool, ConnectionPool, and
the pool initialization logic) so callers know the driver cannot operate without
the explicit ConnectionPool; do not call mssql.connect() anywhere in this
module.

302-312: ⚠️ Potential issue | 🟡 Minor

Mixed named + multi-unnamed columns still drop named headers.

Already flagged in an earlier review and not addressed. For SELECT name, COUNT(*), SUM(x) FROM t (recordset row { name: "alice", "": [42, 100] }), namedKeys.length is 2 but sampleFlat.length is 3, so the branch falls through to synthesizing col_0..col_2, losing the real name header. Preserve the known prefix and only synthesize names for the flattened tail.

♻️ Proposed fix
-      const rows = limitedRecordset.map(flattenRow)
-      const sampleFlat = rows.length > 0 ? rows[0] : []
-      const namedKeys = recordset.length > 0 ? Object.keys(recordset[0]) : []
-      const columns =
-        namedKeys.length === sampleFlat.length
-          ? namedKeys
-          : sampleFlat.length > 0
-            ? sampleFlat.map((_: any, i: number) => `col_${i}`)
-            : (result.recordset?.columns
-                ? Object.keys(result.recordset.columns)
-                : [])
+      const rows = limitedRecordset.map(flattenRow)
+      const sampleFlat = rows.length > 0 ? rows[0] : []
+      const sampleRow = recordset[0] ?? {}
+      const columns: string[] = []
+      if (sampleFlat.length === 0) {
+        if (result.recordset?.columns) columns.push(...Object.keys(result.recordset.columns))
+      } else {
+        let i = 0
+        for (const [k, v] of Object.entries(sampleRow)) {
+          if (k === "" && Array.isArray(v)) {
+            for (let j = 0; j < v.length; j++) columns.push(`col_${i++}`)
+          } else {
+            columns.push(k === "" ? `col_${i}` : k)
+            i++
+          }
+        }
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 302 - 312, The code currently
drops known named headers when some columns are flattened (variables:
limitedRecordset, flattenRow, rows, sampleFlat, namedKeys, columns), so change
the columns construction to preserve the namedKeys prefix and only synthesize
names for any extra flattened entries: if namedKeys.length === sampleFlat.length
use namedKeys; if namedKeys.length < sampleFlat.length set columns =
[...namedKeys, ...sampleFlat.slice(namedKeys.length).map((_, i) =>
`col_${namedKeys.length + i}`)]; otherwise (sampleFlat shorter) keep the current
fallback logic that uses sampleFlat or result.recordset?.columns as before.
Ensure this logic replaces the current ternary chain building columns.
packages/drivers/test/sqlserver-unit.test.ts (1)

176-181: ⚠️ Potential issue | 🟡 Minor

recordset_columns is still not a real mssql shape.

Previously flagged; the key remains incorrect. mssql exposes metadata at result.recordset.columns, not as a sibling recordset_columns, and the driver (see sqlserver.ts line 310) reads result.recordset?.columns. The test passes only because recordset is empty, so future readers will copy the wrong pattern.

💚 Proposed fix
-      mockQueryResult = { recordset: [], recordset_columns: {} }
+      const recordset: any = []
+      recordset.columns = {}
+      mockQueryResult = { recordset }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 176 - 181, The
test constructs a fake mssql result with the wrong metadata shape—using
mockQueryResult.recordset_columns instead of the real mssql shape
result.recordset.columns—so update the test ("empty result returns correctly" in
packages/drivers/test/sqlserver-unit.test.ts) to build mockQueryResult with
recordset: [] and an empty columns object under recordset.columns (i.e.,
mockQueryResult.recordset = Object.assign([], { columns: {} })) so the driver
code in sqlserver.ts (which reads result.recordset?.columns) exercises the
correct shape when calling connector.execute("SELECT * FROM t").
🧹 Nitpick comments (3)
packages/opencode/test/altimate/data-diff-cross-dialect.test.ts (1)

114-127: Strengthen the same-warehouse joindiff positive-path assertions.

This test only checks result.success === true, which passes as long as no guard fires — it won’t catch regressions where joindiff silently degrades (e.g., falls back to hashdiff, skips execution, or produces empty SQL). Since the negative test above asserts sqlLog.length === 0, the symmetric positive assertion is cheap and much stronger.

🧪 Suggested additional assertions
     const result = await runDataDiff({
       source: "dbo.orders",
       target: "dbo.orders_v2",
       key_columns: ["id"],
       source_warehouse: "shared",
       target_warehouse: "shared",
       algorithm: "joindiff",
     })
     expect(result.success).toBe(true)
+    // Joindiff must actually dispatch SQL to the shared warehouse
+    expect(sqlLog.length).toBeGreaterThan(0)
+    expect(sqlLog.every((x) => x.warehouse === "shared")).toBe(true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts` around lines
114 - 127, The test "same-warehouse joindiff is allowed" currently only checks
result.success; strengthen it by also asserting that the joindiff actually
executed SQL instead of silently degrading: after calling runDataDiff (the test
using Registry.setConfigs and runDataDiff), assert that sqlLog (the request/SQL
capture used in other tests) contains entries (e.g., sqlLog.length > 0) and that
the executed statement or logged payload indicates joindiff behavior (e.g.,
contains "JOIN" or the joindiff-specific marker), so failures where joindiff
falls back or produces no SQL are caught.
packages/drivers/src/sqlserver.ts (2)

170-186: execSync blocks the event loop inside an async path.

The Azure CLI fallback uses synchronous execSync with a 15-second timeout inside acquireAzureToken(), which will stall the entire event loop while waiting for az account get-access-token. Prefer execFile promisified (or spawn) so that other tasks (e.g. in-flight pool requests, other connectors initializing in parallel) continue to progress. Also, passing the command as a single formatted string through execSync is a minor shell-injection concern if resourceUrl is ever derived from untrusted input — execFile with an argv array avoids that entirely.

♻️ Suggested approach
-            try {
-              const { execSync } = await import("node:child_process")
-              const out = execSync(
-                `az account get-access-token --resource ${resourceUrl} --query accessToken -o tsv`,
-                { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] },
-              ).trim()
+            try {
+              const { execFile } = await import("node:child_process")
+              const { promisify } = await import("node:util")
+              const execFileP = promisify(execFile)
+              const { stdout } = await execFileP(
+                "az",
+                ["account", "get-access-token", "--resource", resourceUrl, "--query", "accessToken", "-o", "tsv"],
+                { encoding: "utf-8", timeout: 15000 },
+              )
+              const out = stdout.trim()
               if (out) {
                 token = out
                 expiresAt = parseTokenExpiry(out)
               }
             } catch (err: any) {
               azCliStderr = String(err?.stderr ?? err?.message ?? "").slice(0, 200).trim()
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 170 - 186, The use of
synchronous execSync inside acquireAzureToken blocks the event loop; replace it
with a non-blocking child process call (e.g., promisified execFile or spawn) to
run "az" with arguments ["account","get-access-token","--resource",
resourceUrl,"--query","accessToken","-o","tsv"], capture stdout/stderr
asynchronously, enforce the 15s timeout via timers or the child process API, and
on success set token and expiresAt (via parseTokenExpiry(out)); on failure
capture stderr/message into azCliStderr (trim to 200 chars) and preserve
existing error-handling flow so behavior of token, expiresAt, and azCliStderr
remains the same.

140-204: Concurrent cache-miss calls will parallel-fetch the same token.

If two connect() calls for the same (resource, clientId) start before either populates the cache, both will independently run DefaultAzureCredential.getToken() (or the az CLI) and both write to the cache. This is wasteful for AAD throttling and surprising during app startup where several diffs connect in parallel. Caching the in-flight Promise<string> (rather than the resolved token) de-duplicates concurrent acquisitions for free — and on rejection you simply evict and let the next caller retry.

const inflight = new Map<string, Promise<string>>()
// inside acquireAzureToken:
const existing = inflight.get(cacheKey)
if (existing) return existing
const p = (async () => { /* existing acquisition body */ })()
  .finally(() => inflight.delete(cacheKey))
inflight.set(cacheKey, p)
return p
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 140 - 204, acquireAzureToken
currently allows concurrent cache-miss callers to run duplicate AAD/az CLI
fetches; introduce an inflight Map<string, Promise<string>> (e.g., inflight)
keyed by cacheKey and at the start of acquireAzureToken return an existing
inflight.get(cacheKey) if present, otherwise wrap the entire acquisition logic
in an async Promise, set inflight.set(cacheKey, promise) before awaiting it, and
in promise.finally() delete the inflight entry so failed attempts can be
retried; keep using tokenCache as before (check cached token first, write
resolved token into tokenCache) and on rejection ensure the inflight entry is
removed so subsequent callers will create a new fetch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 576-596: The "different clientIds cache separately" test currently
only asserts both connections received "shared-token" which doesn't prove
separate caching; modify the test in the test block named "different clientIds
cache separately" to either (1) make azureIdentityState.tokenOverride return
distinct tokens per call (so the first connect yields "token-1" and the second
"token-2") and assert mockConnectCalls[0].authentication.options.token and
mockConnectCalls[1].authentication.options.token differ accordingly, or (2) keep
the shared-token but assert the underlying getToken mock was invoked twice (e.g.
check the getToken invocation counter after resetMocks and the two connect()
calls equals 2) to prove cache keying is per azure_client_id; refer to
azureIdentityState.tokenOverride, resetMocks(), connect(...), and
mockConnectCalls to implement the change.

In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts`:
- Around line 27-42: The module-level mocks created with
mock.module("@altimateai/altimate-core", ...) (and the registry mock) can leak
across tests; add cleanup by calling mock.restoreModule() in an afterEach (or
afterAll) to restore original module exports, or replace the module-level
mock.module usage with scoped spyOn calls inside the test suite setup (e.g.,
inside beforeEach) so the mock is not process-scoped; update the test file to
call mock.restoreModule() after each test or convert the mocks to spyOn to
prevent cross-file leakage.

---

Duplicate comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 255-262: The fallback branch that calls mssql.connect()
reintroduces the global shared pool bug; replace the fallback with a hard
failure instead of silently using the global pool: when MssqlConnectionPool is
falsy, throw a descriptive error indicating ConnectionPool is required
(referencing MssqlConnectionPool, ConnectionPool, and the pool initialization
logic) so callers know the driver cannot operate without the explicit
ConnectionPool; do not call mssql.connect() anywhere in this module.
- Around line 302-312: The code currently drops known named headers when some
columns are flattened (variables: limitedRecordset, flattenRow, rows,
sampleFlat, namedKeys, columns), so change the columns construction to preserve
the namedKeys prefix and only synthesize names for any extra flattened entries:
if namedKeys.length === sampleFlat.length use namedKeys; if namedKeys.length <
sampleFlat.length set columns = [...namedKeys,
...sampleFlat.slice(namedKeys.length).map((_, i) => `col_${namedKeys.length +
i}`)]; otherwise (sampleFlat shorter) keep the current fallback logic that uses
sampleFlat or result.recordset?.columns as before. Ensure this logic replaces
the current ternary chain building columns.

In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 176-181: The test constructs a fake mssql result with the wrong
metadata shape—using mockQueryResult.recordset_columns instead of the real mssql
shape result.recordset.columns—so update the test ("empty result returns
correctly" in packages/drivers/test/sqlserver-unit.test.ts) to build
mockQueryResult with recordset: [] and an empty columns object under
recordset.columns (i.e., mockQueryResult.recordset = Object.assign([], {
columns: {} })) so the driver code in sqlserver.ts (which reads
result.recordset?.columns) exercises the correct shape when calling
connector.execute("SELECT * FROM t").

---

Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 170-186: The use of synchronous execSync inside acquireAzureToken
blocks the event loop; replace it with a non-blocking child process call (e.g.,
promisified execFile or spawn) to run "az" with arguments
["account","get-access-token","--resource",
resourceUrl,"--query","accessToken","-o","tsv"], capture stdout/stderr
asynchronously, enforce the 15s timeout via timers or the child process API, and
on success set token and expiresAt (via parseTokenExpiry(out)); on failure
capture stderr/message into azCliStderr (trim to 200 chars) and preserve
existing error-handling flow so behavior of token, expiresAt, and azCliStderr
remains the same.
- Around line 140-204: acquireAzureToken currently allows concurrent cache-miss
callers to run duplicate AAD/az CLI fetches; introduce an inflight Map<string,
Promise<string>> (e.g., inflight) keyed by cacheKey and at the start of
acquireAzureToken return an existing inflight.get(cacheKey) if present,
otherwise wrap the entire acquisition logic in an async Promise, set
inflight.set(cacheKey, promise) before awaiting it, and in promise.finally()
delete the inflight entry so failed attempts can be retried; keep using
tokenCache as before (check cached token first, write resolved token into
tokenCache) and on rejection ensure the inflight entry is removed so subsequent
callers will create a new fetch.

In `@packages/opencode/test/altimate/data-diff-cross-dialect.test.ts`:
- Around line 114-127: The test "same-warehouse joindiff is allowed" currently
only checks result.success; strengthen it by also asserting that the joindiff
actually executed SQL instead of silently degrading: after calling runDataDiff
(the test using Registry.setConfigs and runDataDiff), assert that sqlLog (the
request/SQL capture used in other tests) contains entries (e.g., sqlLog.length >
0) and that the executed statement or logged payload indicates joindiff behavior
(e.g., contains "JOIN" or the joindiff-specific marker), so failures where
joindiff falls back or produces no SQL are caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dd24dc82-bab2-4760-ab04-8fc4b312c420

📥 Commits

Reviewing files that changed from the base of the PR and between 1977232 and 872e082.

📒 Files selected for processing (6)
  • packages/drivers/src/normalize.ts
  • packages/drivers/src/sqlserver.ts
  • packages/drivers/test/sqlserver-unit.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts
  • packages/opencode/test/altimate/data-diff-cross-dialect.test.ts
  • packages/opencode/test/altimate/data-diff-cte.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/drivers/src/normalize.ts
  • packages/opencode/test/altimate/data-diff-cte.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts

Comment thread packages/drivers/test/sqlserver-unit.test.ts
Comment thread packages/opencode/test/altimate/data-diff-cross-dialect.test.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 6 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/drivers/src/sqlserver.ts">

<violation number="1" location="packages/drivers/src/sqlserver.ts:50">
P2: Normalize explicit `azure_resource_url` to include a trailing slash before appending `.default`, otherwise `getToken` can receive an invalid scope string.</violation>
</file>

<file name="packages/opencode/test/altimate/data-diff-cross-dialect.test.ts">

<violation number="1" location="packages/opencode/test/altimate/data-diff-cross-dialect.test.ts:97">
P2: Use the same dialect on both warehouses here. As written, this only covers mixed-dialect diffing and won't catch same-dialect cross-warehouse joindiff regressions.</violation>
</file>

<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:870">
P1: `crossWarehouse` is derived from dialect equality, so joindiff is incorrectly allowed across different warehouses that use the same dialect.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts Outdated
Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/opencode/test/altimate/data-diff-cross-dialect.test.ts Outdated
…lot)

Addresses the remaining issues raised by coderabbitai, cubic-dev-ai, and the
Copilot PR reviewer on top of the multi-model consensus fix.

### CRITICAL

- **`@azure/identity` peer dep removed** (`drivers/package.json`)
  `mssql@12` → `tedious@19` bundles `@azure/identity ^4.2.1` as a regular
  dependency. Declaring it here as an optional peer was redundant and caused
  transitive-version-drift concerns. Users get the correct version
  automatically through the tedious chain; our lazy import handles the
  browser-bundle edge case itself.

### MAJOR

- **Cross-dialect date partition literal normalization** (`data-diff.ts`)
  `buildPartitionDiscoverySQL` on MSSQL returns a JS `Date` object, stringified
  upstream as `"Mon Jan 01 2024 …"`. `CONVERT(DATE, …, 23)` rejects that format.
  Normalize `partitionValue` to ISO `yyyy-mm-dd` before dialect casting so the
  T-SQL/Fabric path works end-to-end on dates discovered from MSSQL sources.

- **`crossWarehouse` uses resolved warehouse identity** (`data-diff.ts`)
  Previous commit gated on dialect compare, which treated two independent
  MSSQL instances as "same warehouse" and would have let `joindiff` route a
  JOIN through a warehouse that can't resolve the other side's base tables.
  Now resolves both sides' warehouse name (falling back to the default
  warehouse when a side is omitted) and compares identities — identity-based
  gating handles both the "undefined vs default" case (cubic) and the
  "same-dialect, different instance" case (Copilot).

- **Drop `mssql.connect()` fallback** (`sqlserver.ts`)
  `mssql@^12` guarantees `ConnectionPool` as a named export. The fallback
  silently re-introduced the global-shared-pool bug this branch was added
  to fix. Now throws a descriptive error if `ConnectionPool` is missing —
  cross-database pool interference cannot regress.

- **Non-string `config.authentication` guarded** (`sqlserver.ts`)
  Caller passing a pre-built `{ type, options }` block (or `null`) previously
  crashed with `TypeError: rawAuth.toLowerCase is not a function`. Now only
  applies the shorthand lookup when `rawAuth` is a string; other values pass
  through so tedious can handle them or reject them with its own error.

- **Unknown `azure-active-directory-*` subtype fails fast** (`sqlserver.ts`)
  Typos or future tedious subtypes previously dropped through all `else if`
  branches, producing a config with `encrypt: true` but no `authentication`
  block. tedious then surfaced an opaque error far from the root cause.
  Now throws with the offending subtype and the supported list.

- **`execSync` replaced with async `exec`** (`sqlserver.ts`)
  The `az account get-access-token` CLI fallback previously blocked the
  event loop for up to 15s. Switched to `util.promisify(exec)` so the
  connection path stays non-blocking.

- **Mixed named + unnamed column derivation preserves headers** (`sqlserver.ts`)
  Previously `SELECT name, COUNT(*), SUM(x)` produced either `["name", ""]`
  (blank header) or `["col_0", "col_1", "col_2"]` (lost `name`). Rewrote
  column/row derivation to iterate in one pass, preserving known named
  columns and synthesizing `col_N` only for expanded `""`-key positions.

### MINOR

- **`(no values)` fallback for empty `diff_row.values` array** (`tools/data-diff.ts`)
  `[].join(" | ") ?? "(no values)"` never fires because `""` is falsy-but-not-
  nullish. Gate on `d.values?.length` instead.

### Test / docs

- `sqlserver-unit.test.ts`: token-cache client-id test now counts actual
  `getToken` invocations (previous version only verified both got the same
  mocked token, which proved nothing about keying).
- `sqlserver-unit.test.ts`: "empty result" test now mirrors the real mssql
  shape (`recordset.columns` is a property *on* the recordset array, not a
  sibling key).
- `sqlserver-unit.test.ts`: added mixed-column regression tests — "name +
  COUNT + SUM" and "single unnamed column" — to lock in the derivation fix.
- `sqlserver-unit.test.ts`: stubbed async `exec` via `util.promisify.custom`
  so tests drive both the `execSync` legacy path and the new async path.
- `SKILL.md`: Fabric config fenced block now declares `yaml` (markdownlint
  MD040).

All tests: 43/43 sqlserver driver + 238/238 opencode test suite.

Attribution: findings identified by coderabbitai, cubic-dev-ai, and the
Copilot PR reviewer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
packages/drivers/src/sqlserver.ts (1)

147-218: Consider single-flight dedup for concurrent cold-cache token acquisitions.

If two (or many) connect() calls for the same (resourceUrl, clientId) race before the first token lands in the cache, each call runs the full DefaultAzureCredentialaz path independently — exactly the "wasteful, risks throttling" scenario the cache comment flags. A small in-flight map (Map<string, Promise<string>>) next to tokenCache collapses concurrent cold requests to a single network roundtrip.

♻️ Suggested approach
 const tokenCache = new Map<string, { token: string; expiresAt: number }>()
+const inflightTokenRequests = new Map<string, Promise<string>>()

Then in acquireAzureToken:

         const acquireAzureToken = async (): Promise<string> => {
           const cached = tokenCache.get(cacheKey)
           if (cached && cached.expiresAt - Date.now() > TOKEN_REFRESH_MARGIN_MS) {
             return cached.token
           }
+          const existing = inflightTokenRequests.get(cacheKey)
+          if (existing) return existing
+          const p = (async () => {
+            // ... existing acquisition + cache-set logic ...
+            return token!
+          })().finally(() => inflightTokenRequests.delete(cacheKey))
+          inflightTokenRequests.set(cacheKey, p)
+          return p
         }

Low priority — only matters when multiple pools spin up concurrently at process start — but it's cheap insurance for the documented motivation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 147 - 218, acquireAzureToken
can run multiple concurrent cold-cache fetches; add a single-flight map (e.g.
const inFlight = new Map<string, Promise<string>>()) alongside tokenCache and
use cacheKey as the key: at start of acquireAzureToken check if
inFlight.has(cacheKey) and return the stored Promise; if not, create the Promise
that performs the existing token acquisition logic (DefaultAzureCredential path
then az CLI fallback), store it in inFlight before awaiting internals, and
ensure you remove(inFlight) in both fulfillment and rejection handlers so
failures don't permanently block retries; keep existing tokenCache.set and
TOKEN_REFRESH_MARGIN_MS logic unchanged.
packages/drivers/test/sqlserver-unit.test.ts (1)

119-131: Optional: drop the execSync mock branch now that production uses async exec.

The driver no longer calls execSync (it was replaced with promisify(exec) in packages/drivers/src/sqlserver.ts Line 183-189). The execSync stub here is dead mock-surface — keeping it risks implying execSync is still a supported fallback path in the driver.

♻️ Suggested cleanup
 mock.module("node:child_process", () => ({
   ...realChildProcess,
-  execSync: (cmd: string) => {
-    cliState.lastCmd = cmd
-    if (cliState.throwError) {
-      const e: any = new Error(cliState.throwError.message ?? "az failed")
-      e.stderr = cliState.throwError.stderr
-      throw e
-    }
-    return cliState.output
-  },
   exec: execStub,
 }))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 119 - 131, Remove
the dead execSync branch from the test mock: in the mock.module call that
currently spreads realChildProcess and defines execSync and exec, delete the
execSync property and its closure (the cliState handling and thrown Error
creation) so only exec: execStub remains; keep the realChildProcess spread and
existing execStub usage (referencing mock.module, execStub, and cliState) to
reflect that production now uses the async promisified exec path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 186-200: The current call uses execAsync with a shell-interpolated
command including resourceUrl (user input), risking shell injection; replace
that usage by invoking execFile (or your promisified execFileAsync) to call "az"
with args ["account", "get-access-token", "--resource", resourceUrl, "--query",
"accessToken", "-o", "tsv"] so the value is passed as an argv element (no
shell), retain the same options (encoding, timeout) and preserve the existing
handling of stdout -> out -> token/expiresAt and azCliStderr in the catch; also
update the related test stub to mock execFile/execFileAsync instead of execAsync
or assert the argv array.

In `@packages/opencode/src/altimate/native/connections/data-diff.ts`:
- Around line 810-820: runPartitionedDiff currently replaces
params.source/params.target with SELECT subqueries and then calls runDataDiff,
which prevents discoverExtraColumns from recognizing real table names and causes
extra_columns to be lost; fix this by running discoverExtraColumns (and
capturing discoveredExcluded) inside runPartitionedDiff while
params.source/params.target are still plain table names, then pass the
discovered extra_columns into the recursive runDataDiff call (e.g., set
params.extra_columns and params.excluded_audit_columns or forward
discoveredExcluded) instead of letting runDataDiff attempt discovery on the
SELECT strings; ensure you do not set partition_column in the recursive call as
before.

---

Nitpick comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 147-218: acquireAzureToken can run multiple concurrent cold-cache
fetches; add a single-flight map (e.g. const inFlight = new Map<string,
Promise<string>>()) alongside tokenCache and use cacheKey as the key: at start
of acquireAzureToken check if inFlight.has(cacheKey) and return the stored
Promise; if not, create the Promise that performs the existing token acquisition
logic (DefaultAzureCredential path then az CLI fallback), store it in inFlight
before awaiting internals, and ensure you remove(inFlight) in both fulfillment
and rejection handlers so failures don't permanently block retries; keep
existing tokenCache.set and TOKEN_REFRESH_MARGIN_MS logic unchanged.

In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 119-131: Remove the dead execSync branch from the test mock: in
the mock.module call that currently spreads realChildProcess and defines
execSync and exec, delete the execSync property and its closure (the cliState
handling and thrown Error creation) so only exec: execStub remains; keep the
realChildProcess spread and existing execStub usage (referencing mock.module,
execStub, and cliState) to reflect that production now uses the async
promisified exec path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2c693281-52dc-47f7-b49c-a79a1d7331cd

📥 Commits

Reviewing files that changed from the base of the PR and between 872e082 and 38cfb0e.

📒 Files selected for processing (6)
  • .opencode/skills/data-parity/SKILL.md
  • packages/drivers/package.json
  • packages/drivers/src/sqlserver.ts
  • packages/drivers/test/sqlserver-unit.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts
  • packages/opencode/src/altimate/tools/data-diff.ts
✅ Files skipped from review due to trivial changes (2)
  • packages/drivers/package.json
  • .opencode/skills/data-parity/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opencode/src/altimate/tools/data-diff.ts

Comment thread packages/drivers/src/sqlserver.ts Outdated
Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts
Commit 38cfb0e removed `@azure/identity` from the drivers package's
`peerDependencies` (tedious already bundles it), but the lockfile's
`packages/drivers` workspace section still carried the corresponding
`peerDependencies` and `optionalPeers` blocks. CI running
`bun install --frozen-lockfile` would fail on the drift.

Minimal edit — just removes the two stale blocks. No resolution changes
(`bun install --frozen-lockfile` passes with "no changes").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">

<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:605">
P1: The new date parsing rewrites MySQL/MariaDB week partition values (`YYYY-WW`) into `YYYY-MM-DD`, causing partition WHERE clauses that never match.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread packages/opencode/src/altimate/native/connections/data-diff.ts Outdated
suryaiyer95 and others added 2 commits April 17, 2026 15:52
The prior integration-style test mocked the Registry module globally with
`mock.module(".../registry", ...)`, which leaks across all test files in
bun:test's single-process runner. That caused 14 unrelated tests in
`connections.test.ts`, `telemetry-safety.test.ts`, and
`dbt-first-execution.test.ts` to fail in CI.

Additionally, the test relied on `mock.module("@altimateai/altimate-core")`
to supply a fake `DataParitySession`. The npm-published 0.2.6 of that
package does not export `DataParitySession` (sessions are only in the
locally-built `altimate-core-internal` binary), and Bun's `mock.module`
cannot override a package that another test file has already imported —
so the integration test was structurally unreliable.

Resolution:

1. **Export pure SQL-builder helpers** from `data-diff.ts`
   (`dateTruncExpr`, `buildPartitionWhereClause`) and unit-test them
   directly. No module mocking required; the test directly exercises the
   logic the CRITICAL/MAJOR fix changed.

2. **Move the `joindiff` + cross-warehouse guard earlier** in `runDataDiff`
   — before the NAPI import. Semantically identical for callers (guard
   still fires, same error message, `steps: 0`), but now it can be
   integration-tested without any NAPI mock. Preserves end-to-end
   wiring coverage for the guard.

3. **Rewrite `data-diff-cross-dialect.test.ts`** as pure-function unit
   tests for the partition WHERE logic + a real `runDataDiff` call for
   the joindiff guard. No more cross-file mock pollution.

Functionality unchanged:
- `runDataDiff` behavior for real callers is identical. The only
  observable difference is error-ordering: if a caller simultaneously
  omits NAPI and passes `joindiff + cross-warehouse`, they now get the
  "joindiff requires same warehouse" error instead of the NAPI-missing
  error. That's strictly better UX — NAPI availability is a deployment
  concern, `joindiff`+cross-warehouse is a user error.
- `buildPartitionWhereClause` and `dateTruncExpr` are now exported but
  semantically unchanged — same inputs, same outputs.

Test results:
- 2821 altimate tests pass, 0 fail
- 43 sqlserver driver tests pass, 0 fail
- The 19 remaining full-suite failures (`mcp/`, `tool/project-scan`,
  `plan-approval-phrase`) are pre-existing on `main` and unrelated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/MINOR)

Addresses 5 substantive issues raised by the latest round of bot reviews.

### P1 / MAJOR

- **MySQL/MariaDB week-partition values no longer corrupted** (cubic P1,
  data-diff.ts:610) — the prior ISO `yyyy-mm-dd` normalization applied to
  every dialect silently rewrote MySQL `DATE_FORMAT(%Y-%u)` outputs like
  `"2024-42"` into invalid dates, producing WHERE clauses that never
  match. Scope the normalization to T-SQL / Fabric only — those use
  `CONVERT(DATE, …, 23)` which is the only code path that requires ISO.
  Postgres, MySQL, ClickHouse, BigQuery, Oracle all get the raw value
  verbatim, matching their own `DATE_TRUNC`/`toStartOf*` output.

- **Partitioned diff no longer drops extra_columns** (coderabbit MAJOR,
  data-diff.ts:824) — the partition fix wraps each side as a SELECT
  subquery before recursing. `discoverExtraColumns` skips SQL queries
  (only inspects plain table names), so the recursive `runDataDiff` fell
  through to key-only comparison, silently losing value-level diffs.
  Now `runPartitionedDiff` runs discovery ONCE on the plain source
  table up-front and passes the resolved `extra_columns` explicitly to
  each recursive call. Audit-column exclusion metadata is also
  propagated to the aggregated result for user reporting.

### P2 / MINOR

- **`azure_resource_url` trailing slash normalized** (cubic P2,
  sqlserver.ts:50) — an explicit `"https://custom-host"` (no slash)
  would produce an invalid OAuth scope `"https://custom-host.default"`.
  Enforce a trailing slash in `resolveAzureResourceUrl`.

- **`az account get-access-token` uses `execFile`** (coderabbit,
  sqlserver.ts:200) — replaces `exec(<shell command string>)` with
  `execFile("az", [args])` so user-supplied `azure_resource_url` can't
  introduce shell metacharacters into the command string. Also updates
  the test harness to stub both `exec` and `execFile`.

### Test isolation / coverage

- **Added same-dialect cross-warehouse joindiff test** (cubic,
  data-diff-cross-dialect.test.ts:97) — two MSSQL servers with
  different hosts must still be gated by the joindiff guard; previous
  tests only exercised mixed dialects.

- **Added MySQL week-partition regression tests** — prevent future
  revivals of the dialect-unaware ISO rewrite.

- **Added trailing-slash `azure_resource_url` test.**

Test results:
- 44/44 sqlserver driver tests pass
- 2824/2824 altimate tests pass, 0 fail
- Remaining full-suite failures (`mcp/`, `tool/project-scan`,
  `plan-approval-phrase`) are pre-existing on `main`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai bot commented Apr 17, 2026

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/drivers/src/sqlserver.ts (1)

316-328: ⚠️ Potential issue | 🟠 Major

Preserve DISTINCT/ALL placement when injecting TOP.

Line 325 inserts TOP immediately after SELECT, producing invalid T-SQL: SELECT DISTINCT col FROM t becomes SELECT TOP 1001 DISTINCT col FROM t. Per Microsoft's official syntax, DISTINCT/ALL must precede TOP: SELECT [ ALL | DISTINCT ] [ TOP ( expression ) ] .... Update the regex to capture and preserve these quantifiers, inserting TOP after them.

🐛 Proposed fix
       let query = sql
       const isSelectLike = /^\s*SELECT\b/i.test(sql)
+      const hasTop = /^\s*SELECT\s+(?:ALL\s+|DISTINCT\s+)?TOP\b/i.test(sql)
       // SQL Server uses TOP, not LIMIT
       if (
         isSelectLike &&
         effectiveLimit &&
-        !/\bTOP\b/i.test(sql) &&
+        !hasTop &&
         !/\bLIMIT\b/i.test(sql)
       ) {
-        // Insert TOP after SELECT
+        // Insert TOP after SELECT, preserving SELECT DISTINCT/ALL syntax.
         query = sql.replace(
-          /^(\s*SELECT\s)/i,
-          `$1TOP ${effectiveLimit + 1} `,
+          /^(\s*SELECT\s+)((?:ALL|DISTINCT)\s+)?/i,
+          (_match, selectPrefix, quantifier = "") =>
+            `${selectPrefix}${quantifier}TOP ${effectiveLimit + 1} `,
         )
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 316 - 328, The injection of
TOP after SELECT in the SQL transform mistakenly places TOP before any optional
quantifier (DISTINCT/ALL); update the replacement logic in the block that checks
isSelectLike/effectiveLimit (the code using sql.replace) to capture an optional
ALL|DISTINCT token (and following whitespace) and insert `TOP ${effectiveLimit +
1}` after that token when present, i.e., change the regex so it matches `SELECT`
plus optional `DISTINCT|ALL` group and use replacement groups to preserve the
quantifier positioning and spacing while inserting TOP in the correct spot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 54-66: In resolveAzureResourceUrl, validate that
config.azure_resource_url is a string before calling string methods: if the
explicit value exists but is not a string, throw a clear configuration error;
otherwise use the explicit string, or fallback to the existing host-based
detection (using config.host) and then ensure the returned URL ends with a
slash. This prevents raw.endsWith is not a function by checking typeof
config.azure_resource_url === "string" (and throwing) before using raw.endsWith
and preserves current behavior of resolveAzureResourceUrl.
- Around line 151-173: The token cache and acquisition ignore
config.azure_tenant_id, causing tenant cross-use; update the cacheKey in
acquireAzureToken to include config.azure_tenant_id (e.g.,
`${resourceUrl}|${clientId}|${tenantId||""}`) and when creating
DefaultAzureCredential (in the acquireAzureToken function) pass the tenantId
option when config.azure_tenant_id is present (and likewise include tenant when
any CLI/alternative credential construction supports a tenant option) so tokens
are requested and cached per tenant; reference resolveAzureResourceUrl,
acquireAzureToken, tokenCache, config.azure_tenant_id, and
DefaultAzureCredential to locate the changes.

In `@packages/drivers/test/sqlserver-unit.test.ts`:
- Around line 69-83: The test's Azure identity mock always returns a token so it
won't catch ignored azure_client_id; update the DefaultAzureCredential mock
(inside mock.module("@azure/identity")) to save constructor options into
azureIdentityState (e.g., set azureIdentityState.lastCtorOpts = opts in the
constructor) and then add an assertion in the test that
azureIdentityState.lastCtorOpts.managedIdentityClientId equals the expected
azure_client_id; repeat the same change for the other DefaultAzureCredential
mock used later in the file so both mocks validate the managedIdentityClientId
constructor option.
- Around line 88-151: The test currently only records cliState.lastCmd so a
regression back to using exec (shell interpolation) would still pass; modify the
mocks to track which child_process API was invoked (e.g., add cliState.lastApi
or similar) by setting it in the execStub and execFileStub returned from
makeChildProcessMock (or inside makeChildProcessMock based on how it's
constructed), then update the assertion at the test location to assert
cliState.lastApi === "execFile" (in addition to any existing command/content
assertions) so the test enforces use of execFile rather than exec. Ensure
references: makeChildProcessMock, execStub, execFileStub, cliState.lastCmd, and
the mock.module replacement for node:child_process are updated accordingly.

---

Outside diff comments:
In `@packages/drivers/src/sqlserver.ts`:
- Around line 316-328: The injection of TOP after SELECT in the SQL transform
mistakenly places TOP before any optional quantifier (DISTINCT/ALL); update the
replacement logic in the block that checks isSelectLike/effectiveLimit (the code
using sql.replace) to capture an optional ALL|DISTINCT token (and following
whitespace) and insert `TOP ${effectiveLimit + 1}` after that token when
present, i.e., change the regex so it matches `SELECT` plus optional
`DISTINCT|ALL` group and use replacement groups to preserve the quantifier
positioning and spacing while inserting TOP in the correct spot.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f981ad72-dfb3-4166-abdc-8bf1fe0f5d42

📥 Commits

Reviewing files that changed from the base of the PR and between 64dd815 and 282876b.

📒 Files selected for processing (4)
  • packages/drivers/src/sqlserver.ts
  • packages/drivers/test/sqlserver-unit.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts
  • packages/opencode/test/altimate/data-diff-cross-dialect.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/opencode/test/altimate/data-diff-cross-dialect.test.ts
  • packages/opencode/src/altimate/native/connections/data-diff.ts

Comment on lines +54 to +66
function resolveAzureResourceUrl(config: ConnectionConfig): string {
const explicit = config.azure_resource_url as string | undefined
const raw = explicit ?? (() => {
const host = (config.host as string | undefined) ?? ""
if (host.includes(".usgovcloudapi.net") || host.includes(".datawarehouse.fabric.microsoft.us")) {
return "https://database.usgovcloudapi.net/"
}
if (host.includes(".chinacloudapi.cn")) {
return "https://database.chinacloudapi.cn/"
}
return "https://database.windows.net/"
})()
return raw.endsWith("/") ? raw : `${raw}/`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate azure_resource_url before calling string methods.

Line 55 casts an unknown config value to string; a non-string value will throw raw.endsWith is not a function. Fail fast with a clear config error instead.

🛡️ Proposed fix
 function resolveAzureResourceUrl(config: ConnectionConfig): string {
-  const explicit = config.azure_resource_url as string | undefined
+  const explicitValue = config.azure_resource_url
+  if (explicitValue != null && typeof explicitValue !== "string") {
+    throw new Error("azure_resource_url must be a string when provided")
+  }
+  const explicit = explicitValue?.trim() || undefined
   const raw = explicit ?? (() => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 54 - 66, In
resolveAzureResourceUrl, validate that config.azure_resource_url is a string
before calling string methods: if the explicit value exists but is not a string,
throw a clear configuration error; otherwise use the explicit string, or
fallback to the existing host-based detection (using config.host) and then
ensure the returned URL ends with a slash. This prevents raw.endsWith is not a
function by checking typeof config.azure_resource_url === "string" (and
throwing) before using raw.endsWith and preserves current behavior of
resolveAzureResourceUrl.

Comment on lines +151 to +173
const resourceUrl = resolveAzureResourceUrl(config)
const clientId = (config.azure_client_id as string | undefined) ?? ""
const cacheKey = `${resourceUrl}|${clientId}`

const acquireAzureToken = async (): Promise<string> => {
const cached = tokenCache.get(cacheKey)
if (cached && cached.expiresAt - Date.now() > TOKEN_REFRESH_MARGIN_MS) {
return cached.token
}

let token: string | undefined
let expiresAt: number | undefined
let azureIdentityError: unknown = null
let azCliStderr = ""

try {
const azureIdentity = await import("@azure/identity")
const credential = new azureIdentity.DefaultAzureCredential(
config.azure_client_id
? { managedIdentityClientId: config.azure_client_id as string }
: undefined,
)
const tokenResponse = await credential.getToken(`${resourceUrl}.default`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

@azure/identity DefaultAzureCredential tenantId option Azure CLI az account get-access-token --tenant

💡 Result:

In the @azure/identity JavaScript package, DefaultAzureCredential supports a tenantId option via DefaultAzureCredentialOptions. Set it explicitly when creating the credential to specify the tenant for authentication. Example: const { DefaultAzureCredential } = require("@azure/identity"); const credential = new DefaultAzureCredential({ tenantId: "your-tenant-id" }); This tenantId is passed to underlying credentials like AzureCliCredential, which calls az account get-access-token internally. Recent Azure CLI updates (merged 2025) allow --tenant with the current tenant for Cloud Shell/managed identity accounts, ensuring compatibility when the SDK passes the tenantId. To use a specific tenant with Azure CLI beforehand: - az login --tenant your-tenant-id - Or az account get-access-token --tenant your-tenant-id --resource https://management.azure.com/ (for ARM scope). The tenantId defaults to AZURE_TENANT_ID environment variable if not specified. For multi-tenant scenarios, use additionallyAllowedTenants: ["*"].

Citations:


🏁 Script executed:

# Check if azure_tenant_id is used elsewhere in the config
rg "azure_tenant_id" packages/drivers/src/sqlserver.ts

Repository: AltimateAI/altimate-code

Length of output: 163


🏁 Script executed:

# Get the context around lines 151-173 and 196-204 in sqlserver.ts
cat -n packages/drivers/src/sqlserver.ts | sed -n '145,210p'

Repository: AltimateAI/altimate-code

Length of output: 3603


🏁 Script executed:

# Check the package.json for `@azure/identity` version
rg "@azure/identity" packages/drivers/package.json -A 1

Repository: AltimateAI/altimate-code

Length of output: 50


🏁 Script executed:

# Search for other Azure authentication flows to see how they handle azure_tenant_id
rg -B 3 -A 3 "azure_tenant_id" packages/drivers/src/

Repository: AltimateAI/altimate-code

Length of output: 1710


🏁 Script executed:

# Find and check package.json for `@azure/identity` version
find packages/drivers -name "package.json" -exec grep -l "azure" {} \; | head -1 | xargs cat | grep -A 2 -B 2 "azure"

Repository: AltimateAI/altimate-code

Length of output: 50


🏁 Script executed:

# Check if there are multiple places in sqlserver.ts that need updating (beyond 196-204)
rg -n "execFileAsync|get-access-token" packages/drivers/src/sqlserver.ts

Repository: AltimateAI/altimate-code

Length of output: 338


🏁 Script executed:

# Get context around lines 196-204 to see the second occurrence mentioned
cat -n packages/drivers/src/sqlserver.ts | sed -n '190,210p'

Repository: AltimateAI/altimate-code

Length of output: 1137


Honor azure_tenant_id in token acquisition and cache key.

azure_tenant_id is accepted for other Azure AD flows, but the default/CLI token path ignores it and caches only by resource/clientId. Two connections targeting different tenants can reuse the wrong cached token or authenticate against the CLI/default tenant instead of the configured tenant.

🔐 Proposed fix
         const resourceUrl = resolveAzureResourceUrl(config)
         const clientId = (config.azure_client_id as string | undefined) ?? ""
-        const cacheKey = `${resourceUrl}|${clientId}`
+        const tenantId = (config.azure_tenant_id as string | undefined) ?? ""
+        const cacheKey = `${resourceUrl}|${clientId}|${tenantId}`
@@
             const azureIdentity = await import("@azure/identity")
-            const credential = new azureIdentity.DefaultAzureCredential(
-              config.azure_client_id
-                ? { managedIdentityClientId: config.azure_client_id as string }
-                : undefined,
-            )
+            const credential = new azureIdentity.DefaultAzureCredential({
+              ...(clientId ? { managedIdentityClientId: clientId } : {}),
+              ...(tenantId ? { tenantId } : {}),
+            })
@@
                 [
                   "account", "get-access-token",
                   "--resource", resourceUrl,
+                  ...(tenantId ? ["--tenant", tenantId] : []),
                   "--query", "accessToken",
                   "-o", "tsv",
                 ],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/src/sqlserver.ts` around lines 151 - 173, The token cache
and acquisition ignore config.azure_tenant_id, causing tenant cross-use; update
the cacheKey in acquireAzureToken to include config.azure_tenant_id (e.g.,
`${resourceUrl}|${clientId}|${tenantId||""}`) and when creating
DefaultAzureCredential (in the acquireAzureToken function) pass the tenantId
option when config.azure_tenant_id is present (and likewise include tenant when
any CLI/alternative credential construction supports a tenant option) so tokens
are requested and cached per tenant; reference resolveAzureResourceUrl,
acquireAzureToken, tokenCache, config.azure_tenant_id, and
DefaultAzureCredential to locate the changes.

Comment on lines +69 to +83
const azureIdentityState = {
lastScope: "" as string,
tokenOverride: null as null | { token: string; expiresOnTimestamp?: number },
throwOnGetToken: false as boolean,
}
mock.module("@azure/identity", () => ({
DefaultAzureCredential: class {
_opts: any
constructor(opts?: any) { this._opts = opts }
async getToken(scope: string) {
azureIdentityState.lastScope = scope
if (azureIdentityState.throwOnGetToken) throw new Error("mock identity failure")
if (azureIdentityState.tokenOverride) return azureIdentityState.tokenOverride
return { token: "mock-azure-token-12345", expiresOnTimestamp: Date.now() + 3600000 }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the credential constructor options.

This test would still pass if azure_client_id were ignored, because the mock always returns the same token. Track the DefaultAzureCredential constructor options and assert managedIdentityClientId.

💚 Proposed test fix
 const azureIdentityState = {
   lastScope: "" as string,
+  lastCredentialOpts: null as any,
   tokenOverride: null as null | { token: string; expiresOnTimestamp?: number },
   throwOnGetToken: false as boolean,
 }
 mock.module("@azure/identity", () => ({
   DefaultAzureCredential: class {
     _opts: any
-    constructor(opts?: any) { this._opts = opts }
+    constructor(opts?: any) {
+      this._opts = opts
+      azureIdentityState.lastCredentialOpts = opts ?? null
+    }
@@
     test("azure-active-directory-default with client_id passes managedIdentityClientId to credential", async () => {
       resetMocks()
+      azureIdentityState.lastCredentialOpts = null
       const c = await connect({
         host: "myserver.database.windows.net",
         database: "db",
@@
       // Token is still passed as access-token regardless of client_id
       expect(cfg.authentication.type).toBe("azure-active-directory-access-token")
       expect(cfg.authentication.options.token).toBe("mock-azure-token-12345")
+      expect(azureIdentityState.lastCredentialOpts).toEqual({
+        managedIdentityClientId: "mi-client-id",
+      })
     })

Also applies to: 372-385

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 69 - 83, The
test's Azure identity mock always returns a token so it won't catch ignored
azure_client_id; update the DefaultAzureCredential mock (inside
mock.module("@azure/identity")) to save constructor options into
azureIdentityState (e.g., set azureIdentityState.lastCtorOpts = opts in the
constructor) and then add an assertion in the test that
azureIdentityState.lastCtorOpts.managedIdentityClientId equals the expected
azure_client_id; repeat the same change for the other DefaultAzureCredential
mock used later in the file so both mocks validate the managedIdentityClientId
constructor option.

Comment on lines +88 to +151
const cliState = {
lastCmd: "" as string,
output: "mock-cli-token-fallback\n" as string,
throwError: null as null | { stderr?: string; message?: string },
}
const realChildProcess = await import("node:child_process")
const realUtil = await import("node:util")

// Helper: build a mock with callback + util.promisify.custom support so
// `promisify(child_process.exec)` or `promisify(child_process.execFile)`
// yields { stdout, stderr } exactly like the real implementation.
function makeChildProcessMock(captureCmd: (args: string) => void) {
const stub: any = (arg0: any, arg1: any, arg2: any, arg3: any) => {
// Accept both exec(cmd, opts?, cb?) and execFile(file, args?, opts?, cb?)
const cb = [arg0, arg1, arg2, arg3].find((x) => typeof x === "function")
// Pick the best "command" representation for test assertions:
// - exec: first arg is the full command string
// - execFile: first arg is the program, second arg is the args array
if (Array.isArray(arg1)) {
captureCmd(`${arg0} ${arg1.join(" ")}`)
} else {
captureCmd(String(arg0))
}
if (cliState.throwError) {
const e: any = new Error(cliState.throwError.message ?? "az failed")
e.stderr = cliState.throwError.stderr
if (cb) cb(e, "", cliState.throwError.stderr ?? "")
return { on() {}, stdout: null, stderr: null }
}
if (cb) cb(null, cliState.output, "")
return { on() {}, stdout: null, stderr: null }
}
stub[realUtil.promisify.custom] = (arg0: any, arg1: any) => {
if (Array.isArray(arg1)) {
captureCmd(`${arg0} ${arg1.join(" ")}`)
} else {
captureCmd(String(arg0))
}
if (cliState.throwError) {
const e: any = new Error(cliState.throwError.message ?? "az failed")
e.stderr = cliState.throwError.stderr
return Promise.reject(e)
}
return Promise.resolve({ stdout: cliState.output, stderr: "" })
}
return stub
}

const execStub = makeChildProcessMock((c) => { cliState.lastCmd = c })
const execFileStub = makeChildProcessMock((c) => { cliState.lastCmd = c })

mock.module("node:child_process", () => ({
...realChildProcess,
execSync: (cmd: string) => {
cliState.lastCmd = cmd
if (cliState.throwError) {
const e: any = new Error(cliState.throwError.message ?? "az failed")
e.stderr = cliState.throwError.stderr
throw e
}
return cliState.output
},
exec: execStub,
execFile: execFileStub,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the CLI fallback uses execFile.

Both exec and execFile currently update cliState.lastCmd, so the test at Line 774 would still pass if the driver regressed back to shell interpolation. Track the invoked child-process API and assert execFile.

🧪 Proposed test hardening
 const cliState = {
   lastCmd: "" as string,
+  lastMethod: "" as "exec" | "execFile" | "execSync" | "",
   output: "mock-cli-token-fallback\n" as string,
   throwError: null as null | { stderr?: string; message?: string },
 }
@@
-function makeChildProcessMock(captureCmd: (args: string) => void) {
+function makeChildProcessMock(method: "exec" | "execFile", captureCmd: (args: string) => void) {
   const stub: any = (arg0: any, arg1: any, arg2: any, arg3: any) => {
+    cliState.lastMethod = method
@@
   stub[realUtil.promisify.custom] = (arg0: any, arg1: any) => {
+    cliState.lastMethod = method
@@
-const execStub = makeChildProcessMock((c) => { cliState.lastCmd = c })
-const execFileStub = makeChildProcessMock((c) => { cliState.lastCmd = c })
+const execStub = makeChildProcessMock("exec", (c) => { cliState.lastCmd = c })
+const execFileStub = makeChildProcessMock("execFile", (c) => { cliState.lastCmd = c })
@@
   execSync: (cmd: string) => {
+    cliState.lastMethod = "execSync"
     cliState.lastCmd = cmd
@@
       await c.connect()
+      expect(cliState.lastMethod).toBe("execFile")
       expect(cliState.lastCmd).toContain("--resource https://database.usgovcloudapi.net/")
     })

Also applies to: 764-775

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/drivers/test/sqlserver-unit.test.ts` around lines 88 - 151, The test
currently only records cliState.lastCmd so a regression back to using exec
(shell interpolation) would still pass; modify the mocks to track which
child_process API was invoked (e.g., add cliState.lastApi or similar) by setting
it in the execStub and execFileStub returned from makeChildProcessMock (or
inside makeChildProcessMock based on how it's constructed), then update the
assertion at the test location to assert cliState.lastApi === "execFile" (in
addition to any existing command/content assertions) so the test enforces use of
execFile rather than exec. Ensure references: makeChildProcessMock, execStub,
execFileStub, cliState.lastCmd, and the mock.module replacement for
node:child_process are updated accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants