This article outlines some of the key concepts around what is required for a file to be formatted easily.
It will also outline the characteristics of files that cannot be formatted easily (in which case we recommend getting a better source file to work with).
Requirements for a File to be Formattable
Sometimes the best file a client can export is not in a MindBridge-ingestible format and it is difficult to format the file yourself. Our data team is available to help format many complicated files, but only if there is some way to systematically identify each of the required fields.
Each required field must satisfy some of the following:
- Follows a pattern - For example: The account number and account description are in the same column, but they are separated by a consistent delimiter, or account numbers are all alphanumeric without spaces, or all account numbers are a specific length.
- Is in a single column - This is the ideal format and will be the eventual result.
- If not in a single column, then be in one of a small number of columns - This case is manageable if, for example, debits are in columns F or G, and no other fields are in columns F or G, or debits can be in F or G, and so can a memo, but a debit always entirely numeric while a memo always has non-numeric characters. However, it is usually time-consuming to format this case because it takes time to figure out the pattern and then write code to spot and deal with this pattern, especially in big files where the pattern might change very far down.
- Contain accurate data - If the amounts are incorrect, certain digits missing, or if there are missing sections then the file will likely be unusable.
In other words, columns must be well defined if not lining up, and the data must be correct.
Also see the Data Requirements page for more details on the criteria that a formatted file must fulfill. If there is no way to transform the file to satisfy these requirements, the file is not formattable.
Characteristics of Unformattable Files
- PDF or image format files - Files that end up in such unusable formats were often initial exports to PDF or image or any format that is not meant for data analysis, then converted to Excel or text. This is why we do not support PDF or image format files, even if you run them through a PDF/image converter.
- Merged cells - One common characteristic of an unformattable file is the presence of merged cells all over the file, and not just in the headers or in specific columns.
- Excessive columns - Another sign that a file might not be formattable is the presence of numerous columns, in addition to the characteristics described above (many merged cells, columns not always lining up or well defined). More than 30 columns can be suspect (beyond column AD or so in Excel), and especially more than 50 columns (beyond column AX in Excel) becomes very suspect.
- Fixed-width text files (sometimes) - Although MindBridge cannot directly ingest fixed-width format text files, they can be formattable if each field always falls in a specific position, for example, debits are always between character positions 49 and 65 on each row. Our data team can also format fixed-width format text files that are grouped, as long as the transaction rows all satisfy the condition of each field always falling in a specific position, and there is some way (e.g., based on pattern and/or length) to identify the transaction rows and the heading rows with the account information that needs to be ungrouped.
Example of an Unformattable File:
In this example, we have Net Change, Beginning Balance, Debit, Credit, and Ending Balance appearing in numerous columns, and often Debit and Credit are both in one column:
Sometimes there are two or three numbers in a cell - they could be Net Change, Beginning Balance, and Debit:
Notice how there are numbers randomly in different cells, and even the headers can be in many different columns (e.g., Beginning Balance). There is no systematic way to identify, for a given row, where to extract the debit or credit amount.