Skip to content

Auto-detect tab-delimited CSV files with manual delimiter override#818

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-sanddance-tab-delimiter
Draft

Auto-detect tab-delimited CSV files with manual delimiter override#818
Copilot wants to merge 3 commits intomainfrom
copilot/fix-sanddance-tab-delimiter

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 27, 2026

Tab-delimited files with a .csv extension are parsed as comma-separated, resulting in all data appearing as a single column. Since the file type is derived purely from the extension, there's no way for users to override the delimiter.

Changes

  • packages/sanddance-explorer/src/dataLoader.ts: Added guessDelimiterType(text, type) that inspects the first line of a csv-typed file and returns tsv if tabs outnumber commas. The detected type is forwarded to both vega.read() and loadDataArray(). Respects a noTypeGuess flag to honour explicit user choices.

    • Uses /\r?\n/ split to handle CRLF (Windows) line endings
    • Strips quoted strings before counting to avoid false positives from commas inside field values (e.g. "Last, First")
    • Conservative: equal tab/comma counts keep the original csv type
    • Non-csv types pass through unchanged
    • Sets DataContent.type to communicate the actual parsed type back to callers without mutating the input
  • packages/sanddance-explorer/src/interfaces.ts: Added noTypeGuess?: boolean to DataFile (bypasses auto-detection when the user explicitly picks a delimiter) and type?: DataFileType to DataContent (carries the actual parsed type back to the caller).

  • packages/sanddance-explorer/src/dialogs/dataBrowser.tsx: Added a Delimiter dropdown to the Data browser sidebar tab. It is shown only when the loaded file is CSV or TSV, and lets users switch between "Comma (,)" and "Tab" at any time — including in the VSCode extension where there is no file upload dialog. Selecting a different option immediately re-parses the data.

  • packages/sanddance-explorer/src/language.ts: Added labelDelimiter, labelDelimiterComma, and labelDelimiterTab strings.

  • packages/sanddance-explorer/src/explorer.tsx: Applies dataContent.type back to the dataFile stored in state (so the dropdown reflects the auto-detected delimiter), and wires dataFile and an onReloadFileType handler into DataBrowser. The handler reloads with noTypeGuess: true so the user's explicit choice is preserved on subsequent loads.

Copilot AI linked an issue Apr 27, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Fix SandDance column detection for tab-delimited CSV files Auto-detect tab-delimited CSV files Apr 27, 2026
Copilot AI requested a review from danmarshall April 27, 2026 20:29
@danmarshall
Copy link
Copy Markdown
Contributor

@copilot maybe we should add a ui step on the upload dialog, both before and after we get the file?
before: let the user choose the character
after: validate the character was found, and they have a chance to switch

Actually - if this is vscode extension and the user didnt explicitly upload, where would the ui go...
i'm envisioning a scenario where i have 3 csv files on my disk of comma vs period or semicolon... maybe your guess code is the best, but we also might want just a "reload as..." button.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 28, 2026

@copilot maybe we should add a ui step on the upload dialog, both before and after we get the file?
before: let the user choose the character
after: validate the character was found, and they have ...

Added a Delimiter dropdown to the Data browser sidebar tab (commit 8226757). It appears whenever the loaded file is CSV or TSV, and lets users switch between "Comma (,)" and "Tab" at any time — this works in the VSCode extension too since it lives in the explorer sidebar, not a file upload dialog.

Implementation details:

  • The auto-detection still runs on first load (e.g. a tab-delimited .csv auto-detects to Tab), so the dropdown shows the correct current delimiter immediately
  • When the user changes the dropdown, the file is re-parsed with the chosen delimiter and noTypeGuess: true, so the explicit choice is respected and won't be overridden by auto-detection on reload
  • The DataContent.type field carries the detected type back to state cleanly, without mutating the input DataFile

Copilot AI changed the title Auto-detect tab-delimited CSV files Auto-detect tab-delimited CSV files with manual delimiter override Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to handle tab delimited csv file?

2 participants