{"id":987,"date":"2014-02-04T00:43:47","date_gmt":"2014-02-04T08:43:47","guid":{"rendered":"http:\/\/technofovea.com\/blog\/?p=987"},"modified":"2016-03-31T23:34:20","modified_gmt":"2016-04-01T06:34:20","slug":"chases-malformed-transaction-records","status":"publish","type":"post","link":"http:\/\/technofovea.com\/blog\/archives\/987","title":{"rendered":"Chase&#8217;s malformed transaction records"},"content":{"rendered":"<h3>The problem:<\/h3>\n<p>Last month, I tried to import some bank-account records (QFX\/OFX formats) into the \u00e2\u20ac\u0153You Need A Budget\u00e2\u20ac\u009d accounting software, which involves telling it how to recognize certain transactions \u00e2\u20ac\u0153groceries\u00e2\u20ac\u009d and \u00e2\u20ac\u0153gas\u00e2\u20ac\u009d etc. This did not go as smoothly as I expected, even for an accounting chore, because many of the payee-name and memo fields had ridiculous values! Manually fixing a lot of scrambled data every month wasn&#8217;t what I had in mind when it came to simplify my budgeting, so I decided to investigate.<\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\r\n&lt;STMTTRN&gt;\r\n&lt;TRNTYPE&gt;DEBIT\r\n&lt;DTPOSTED&gt;20140120120000[0:GMT]\r\n&lt;TRNAMT&gt;-29.41\r\n&lt;FITID&gt;201401020\r\n&lt;NAME&gt;SAFEWAY  STORE  1234 MYCITY ST\r\n&lt;MEMO&gt;01\/20 Purchase $9.41 Cash Back $\r\n&lt;\/STMTTRN&gt;\r\n<\/pre>\n<p>I believe whatever steam-powered mainframes JP Morgan Chase uses don&#8217;t seem to have caught up with the current century: Payee fields and memo fields are combined, truncated, arbitrarily split, and whitespace-trimmed, all presumably as sacrifice to some dark and ancient internal decision that 32 characters (for name) and 32 characters (for memo) were long enough for anybody.\u00c2\u00a0Some folks say it&#8217;s the data-format&#8217;s problem, but I disagree: Chase&#8217;s datafile says it complies with <a href=\"http:\/\/www.ofx.net\/DownloadPage\/Downloads.aspx\">OFX v1.02<\/a>, but if you crack open the spec (dated 1997) it clearly\u00c2\u00a0says that at least the memo-field should support <strong>256<\/strong>\u00c2\u00a0characters, not 32.<\/p>\n<h3>Examples:<\/h3>\n<p>Payee and Memo don&#8217;t really contain the right thing:<\/p>\n<pre style=\"padding-left: 30px;\">Payee: \"Online Payment 1234567890 To Cap\"\r\n Memo: \"ital One Bank\"<\/pre>\n<p>The rest of the memo would have had my cash-back amount (which might be handy in budgeting software) but is truncated:<\/p>\n<pre style=\"padding-left: 30px;\">Payee: \"Grocer &amp;amp; Sons Inc. 12345 Exampl\"\r\n Memo: \"e road 01\/18 Purchase $20.11 Cash b\"<\/pre>\n<p>The split here occurs between two words, but the whitespace was trimmed! There&#8217;s no automatic way to know this is &#8220;Park lane&#8221; vs. &#8220;Park lane&#8221;:<\/p>\n<pre style=\"padding-left: 30px;\">Payee: \"Marios Pizza and Plumbing 5442 Park\" \r\n Memo: \"lane NW\"<\/pre>\n<h3>Current progress<\/h3>\n<p>Right now I have a series of Python classes \u00c2\u00a0which:<\/p>\n<ul>\n<li>Parses the original OFX file(<a href=\"http:\/\/www.ofx.net\/OFXExamplesPage\/OFXExamples.aspx\">example<\/a>)<\/li>\n<li>Translates it into a much-more-convenient XML file with similar structure<\/li>\n<li>Visits every transaction in the XML file and applies custom logic to fix it up<\/li>\n<li>Writes the XML file back out as OFX<\/li>\n<\/ul>\n<p>So far I&#8217;m pretty happy with the result: All I have to do is code logic for a few of the common cases, and run the scripts after I download the OFX files from chase. Here&#8217;s an example of a super-basic statement visitor that just tries to combine Payee and Memo.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n    def visitStatement(self, values):\r\n        name = values.get(&quot;NAME&quot;, &quot;&quot;)\r\n        memo = values.get(&quot;MEMO&quot;, &quot;&quot;)\r\n\r\n        if len(name) &lt; 32:\r\n            # When the split occurred, there was whitespace which got trimmed, re-add it\r\n            combined = name + &quot; &quot; + memo\r\n        elif len(name) == 32:\r\n            # The split was forced due to some size limit, and we don't really know if there's\r\n            # a space between them or not...\r\n            combined = name + memo\r\n        else:\r\n            pass #TODO warning, larger than ever expected\r\n\r\n        values[&quot;NAME&quot;] = combined\r\n        values[&quot;MEMO&quot;] = &quot;&quot; # No more memo data, it's all inside Payee\r\n<\/pre>\n<p>From this humble beginning I can branch out into recognizing common patterns (like transfers) and payees and clean up the data appropriately. After generating a new QFX file, the YNAB software seems to handle the longer payee names just fine.<\/p>\n<h3>Future work<\/h3>\n<p>The biggest problem left is that the data still isn&#8217;t clean enough: Anything over 64 characters has been lost, and it&#8217;s not always clear if I need to reinsert whitespace between Payee and Memo. Fortunately, Chase does offer a CSV download, which isn&#8217;t as useful for importing into accounting applications but <strong>does<\/strong> contain the entire original. I just need to find some way to cross-reference between the two, perhaps based on dates, amounts, and some sort of non-whitespace similarity.<\/p>\n<p>Once I have things a little more polished I plan to put them up on Github, but at the moment there are still a lot of hardcoded data-file paths and stuff.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The problem: Last month, I tried to import some bank-account records (QFX\/OFX formats) into the \u00e2\u20ac\u0153You Need A Budget\u00e2\u20ac\u009d accounting software, which involves telling it how to recognize certain transactions \u00e2\u20ac\u0153groceries\u00e2\u20ac\u009d and \u00e2\u20ac\u0153gas\u00e2\u20ac\u009d etc. This did not go as smoothly as I expected, even for an accounting chore, because many of the payee-name and memo [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[26,27],"_links":{"self":[{"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/posts\/987"}],"collection":[{"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/comments?post=987"}],"version-history":[{"count":9,"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/posts\/987\/revisions"}],"predecessor-version":[{"id":996,"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/posts\/987\/revisions\/996"}],"wp:attachment":[{"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/media?parent=987"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/categories?post=987"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/technofovea.com\/blog\/wp-json\/wp\/v2\/tags?post=987"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}