Best practices for formatting CSV files

Updated February 08, 2024 14:40

Summary

CSV (Comma-Separated Values) files are delimited text files that can store tabular data, using a comma ( , ) to distinguish each value within a row — however, the semi-colon ( ; ), slash, ( / ) or pipe ( | ) can also be used as delimiters. These files can be difficult to scan for issues and mistakes, which can lead to errors in MindBridge.

Learn about the most common issues that can occur in a CSV file below, and find out how to resolve them.

Note: App Admins can set custom delimiters for their teams in the library, which can be overridden if needed during the file import process.

Common issues in CSV files

Mismatched columns
- Issue: The number of delimited values in a row does not match the expected number of data fields (i.e., column headers).
- Resolution: Ensure the data fields within the header row are all present, correct, and appear in the correct position. Review the file for missing or additional values, and ensure each row in the file contains the correct number of values for the number of fields.
Quoting issues
- Issue: Double-quotation marks ( " ) are missing or mismatched, resulting in improper escaping within a value.
- Resolution: Ensure all double-quotation characters ( " ) are part of a pair, and that quoted terms within a value are enclosed by a second set of quotation marks. For example, if a value is meant to read "metal" screw costs to cogs, then the term "metal", as well as the value itself, would require an additional set of quotation marks at the beginning and the end, i.e., """metal"" screw costs to cogs".
Delimiter confusion
- Issue: Incorrect delimiters, or delimiters appearing within the contents of a value.
- Resolution: Use one of MindBridge's standard delimiters (indicated in the summary above) or a custom delimiter consistently throughout the dataset. If any of the standard delimiters appear within the contents of a value, enclose the entire value in double-quotation marks ( " ). For example, if a value is meant to read costs to cogs, screws, bolts, then it should have a set of quotation marks at the beginning and the end of the value, i.e., "costs to cogs, screws, bolts".
Missing values
- Issue: Certain expected values appear to be missing within a row.
- Resolution: Ensure that the imported data is correct and complete. You may proceed with the analysis if blank cells are detected, as MindBridge will ignore blank cells, but if expected data is missing, this can lead to incompleteness and balance issues.
Numeric formatting
- Issue: Inconsistent use of formatting in amount values. For example, while some regions use a period as the decimal separator, (e.g., 10.00) others use a comma as the decimal separator (e.g., 10,00). The use of one or the other is acceptable, but only one decimal separator should be used within a dataset; they should not be combined or used interchangeably.
- Resolution: Ensure to use one format of numeric formatting consistently throughout the dataset.
Date and time formatting
- Issue: Inconsistent, ambiguous, or unsupported date and time formats.
- Resolution: Ensure all dates and timestamps are consistent, cannot be misinterpreted, and are supported by MindBridge. Review our help article on date formats for additional details
Scientific notation
- Issue: In CSV files, scientific notation are taken as presented, regardless of which column they appear in. If mapped to a numeric column (for example, Amount), MindBridge will convert scientific notation to full-length values, which may result in less precise values than the actual data represents.
- Resolution: Convert scientific notation to their true amount values before importing the file.
Nested or hierarchical data
- Issue: CSV files are not suitable for grouped data, or nested or hierarchical data structures, as each row within a CSV file represents a single line item.
- Resolution: Ungroup and flatten the data structure so each data field (column) has data present on every row.

Portions of this document may have had early-stage drafts generated by AI tool(s) and have been reviewed, edited, and clarified by real humans.

Anything else on your mind? Chat with us or submit a request for further assistance.

Questions? We have answers.

Best practices for formatting CSV files

Summary

Common issues in CSV files

Mismatched columns

Quoting issues

Delimiter confusion

Missing values

Numeric formatting

Date and time formatting

Scientific notation

Nested or hierarchical data

Was this article helpful?

Sorry about that! What did you find most unhelpful?

Questions? We have answers.

Search

Summary

Common issues in CSV files

Mismatched columns

Quoting issues

Delimiter confusion

Missing values

Numeric formatting

Date and time formatting

Scientific notation

Nested or hierarchical data

Was this article helpful?

Sorry about that! What did you find most unhelpful?