Thursday, May 7, 2015

Weird Fail: Microsoft Excel Brings Entropy to Simple Data Manipulation

I came across this weird issue with Microsoft Excel. I was in the middle of a project where I was updating a lot of numbers in a database. I had a list of specifications stored in a MySQL table, My friendly neighborhood product engineer provided me with a associated specifications listed as a range of numbers like 126.025 - 126..050. A couple of days later, it became clear that my friendly neighborhood product engineer hadn't actually looked at the existing spec table and that the number already existed. So he decided that he just wanted to add tolerance information to the existing number.

Predictably, what I received from the engineer could be calculated by the base formula Number_of-Specs X Effort_Engineer_Willing_to_Spend_Producing_Equivalent_Number (Lots x 0). After thinking about the problem, I realized that I should be able to manipulate the data in Excel and extract the tolerance info -- inertia can be a powerful force in engineering problems.

First, I began by extracting the numbers one each end of the range cells using a combination of new columns and left and right formulas. After I had two columns of individual numbers, I subtracted one number from the other and I now had a column of tolerances. I copied these numbers, then replaced them with the resulting values using "Paste Special - Values". Then I began preparing to import my new tolerance numbers back into MySQL, when I noticed an issue:

While Excel displayed the result as 0.025, when I looked at the underlying data in the formula bar, I saw 0.0249999999999915. WTF?

It's true that 0.0249999999999915 looks a lot like 0.025, but it is not 0.025. For mathematics and calculation, it may be similar, but it is not .025. At first I thought I had done something wrong, accidentally deleted something unintended or otherwise messed it up. But then I did some more tests and produced the same result. Next I thought, "great, my Mac has one of those Pentium errors." I tested on different systems with different processors. Same issue. I tested different versions of Excel. Same issue. I even tested a Bootcamped Mac Mini running Excel in Windows XP. Same issue.

Binary Numbers and Floating Point Precision Errors
In doing a bit more research, I came across issues where it's noted that Excel has issues with accuracy because it stores numbers as binary. Typically though, this tends to impact numbers at 15 figures of precision. In this case, my numbers were -- at least on the surface -- pretty simple.

Here's my best guesses at root cause:
  1. This binary number issue impacts numbers more significantly than the 15-digit binary issue.
  2. Microsoft is taking in .0000000000000085 on each transaction. Similar to the Superman / Office Space strategy, this percentage will enable them to profit some way.
  3. Microsoft has successfully generated an Entropy simulator that operates in the background in Excel. Where did you're data go? Entropy.
Needless to say, it's left me skeptical about using Microsoft Excel for some of these data manipulation tasks.

No comments: