I Wanted Orange (The machine would not make a mistake.)


Chase_fixer project now up on GitHub

Following up on my previous post, I've worked on packaging the scripts up a little more nicely and the result is now available on github.

So far it's handled all the weird cases that I see from my own financial history, but I expect there are a few more oddball scenarios (like wire-transfers or refunds) which may require additional tweaking as time goes on.

Tagged as: , Comments Off

Chase’s malformed transaction records

The problem:

Last month, I tried to import some bank-account records (QFX/OFX formats) into the “You Need A Budget” accounting software, which involves telling it how to recognize certain transactions “groceries” and “gas” etc. This did not go as smoothly as I expected, even for an accounting chore, because many of the payee-name and memo fields had ridiculous values! Manually fixing a lot of scrambled data every month wasn't what I had in mind when it came to simplify my budgeting, so I decided to investigate.

<MEMO>01/20 Purchase $9.41 Cash Back $

I believe whatever steam-powered mainframes JP Morgan Chase uses don't seem to have caught up with the current century: Payee fields and memo fields are combined, truncated, arbitrarily split, and whitespace-trimmed, all presumably as sacrifice to some dark and ancient internal decision that 32 characters (for name) and 32 characters (for memo) were long enough for anybody. Some folks say it's the data-format's problem, but I disagree: Chase's datafile says it complies with OFX v1.02, but if you crack open the spec (dated 1997) it clearly says that at least the memo-field should support 256 characters, not 32.


Payee and Memo don't really contain the right thing:

Payee: "Online Payment 1234567890 To Cap"
 Memo: "ital One Bank"

The rest of the memo would have had my cash-back amount (which might be handy in budgeting software) but is truncated:

Payee: "Grocer &amp; Sons Inc. 12345 Exampl"
 Memo: "e road 01/18 Purchase $20.11 Cash b"

The split here occurs between two words, but the whitespace was trimmed! There's no automatic way to know this is "Park lane" vs. "Park lane":

Payee: "Marios Pizza and Plumbing 5442 Park" 
 Memo: "lane NW"

Current progress

Right now I have a series of Python classes  which:

  • Parses the original OFX file(example)
  • Translates it into a much-more-convenient XML file with similar structure
  • Visits every transaction in the XML file and applies custom logic to fix it up
  • Writes the XML file back out as OFX

So far I'm pretty happy with the result: All I have to do is code logic for a few of the common cases, and run the scripts after I download the OFX files from chase. Here's an example of a super-basic statement visitor that just tries to combine Payee and Memo.

    def visitStatement(self, values):
        name = values.get("NAME", "")
        memo = values.get("MEMO", "")

        if len(name) < 32:
            # When the split occurred, there was whitespace which got trimmed, re-add it
            combined = name + " " + memo
        elif len(name) == 32:
            # The split was forced due to some size limit, and we don't really know if there's
            # a space between them or not...
            combined = name + memo
            pass #TODO warning, larger than ever expected

        values["NAME"] = combined
        values["MEMO"] = "" # No more memo data, it's all inside Payee

From this humble beginning I can branch out into recognizing common patterns (like transfers) and payees and clean up the data appropriately. After generating a new QFX file, the YNAB software seems to handle the longer payee names just fine.

Future work

The biggest problem left is that the data still isn't clean enough: Anything over 64 characters has been lost, and it's not always clear if I need to reinsert whitespace between Payee and Memo. Fortunately, Chase does offer a CSV download, which isn't as useful for importing into accounting applications but does contain the entire original. I just need to find some way to cross-reference between the two, perhaps based on dates, amounts, and some sort of non-whitespace similarity.

Once I have things a little more polished I plan to put them up on Github, but at the moment there are still a lot of hardcoded data-file paths and stuff.

Tagged as: , Comments Off