Questions? We have answers.

The difference between a file that can be formatted and a file that can't

  • Updated

Summary

Learn some of the key concepts that allow a file to be formatted easily.

It will also outline the characteristics of files that cannot be formatted easily (in which case we recommend getting a better source file to work with).


Requirements for a file to be formattable

Some of your clients' files may not be in a format that MindBridge can ingest, and it may be difficult to format the file yourself. Our data team can help you format complicated files, but only if there is a way to identify each of the required fields systematically.

Data in each required field must satisfy some of the following:

  • Follows a pattern
    • For example, the account number and account description are in the same column, but a consistent delimiter separates them, or; account numbers are all alphanumeric without spaces, or; all account numbers are a specific length.
  • Is in a single column
    • This is the ideal format and will be the eventual result.
  • If the data is not in a single column, then be in one of a small number of columns
    • This is manageable if, for example, debits are in columns F or G, and no other fields are in columns F or G, or debits can be in F or G, and so can a memo, but a debit always entirely numeric while a memo always has non-numeric characters. However, it is usually time-consuming to format this case because it takes time to figure out the pattern and then write code to spot and deal with this pattern, especially in big files where the pattern might change very far down.
  • Contain accurate data
    • If the amounts are incorrect, certain digits missing, or if there are missing sections then the file will likely be unusable.

In other words, columns must be well defined if not lining up, and the data must be correct.

Visit the Data Requirements page for more details on the criteria that a formatted file must fulfill. If there is no way to transform the file to satisfy these requirements, the file is not formattable.


Characteristics of files that cannot be formatted

  • PDF or image format files
    • Files that end up in such unusable formats were often initial exports to PDF or image or any format that is not meant for data analysis, then converted to Excel or text. This is why we do not support PDF or image format files, even if you run them through a PDF/image converter.
  • Merged cells
    • One common characteristic of a file that cannot be formatted is the presence of merged cells all over the file, and not just in the headers or in specific columns.
  • Excessive columns
    • Another sign that a file might not be formattable is the presence of numerous columns, in addition to the characteristics described above (many merged cells, columns not always lining up or well defined). More than 30 columns can be suspect (beyond column AD or so in Excel), and especially more than 50 columns (beyond column AX in Excel) becomes very suspect.
  • Fixed-width text files (sometimes)
    • Although MindBridge cannot directly ingest fixed-width format text files, they can be formattable if each field always falls in a specific position, for example, debits are always between character positions 49 and 65 on each row. Our data team can also format fixed-width format text files that are grouped, as long as the transaction rows all satisfy the condition of each field always falling in a specific position, and there is some way (e.g., based on pattern and/or length) to identify the transaction rows and the heading rows with the account information that needs to be ungrouped.

Example of a file that cannot be formatted

In this example, we have Net Change, Beginning Balance, Debit, Credit, and Ending Balance appearing in numerous columns, and often Debit and Credit are both in one column:

mceclip0.png

mceclip1.png

Sometimes there are 2 or 3 numbers in a cell — they could be Net Change, Beginning Balance, and Debit.

mceclip4.png

Notice how there are numbers randomly in different cells, and even the headers can be in many different columns (e.g., Beginning Balance). There is no systematic way to identify, for a given row, where to extract the debit or credit amount.


Anything else on your mind? Chat with us or submit a request for further assistance.

Was this article helpful?