April 5, 2011

Data Stampede!

During your CS education, and even working on small problems in your real working life, you get the idea that data is like a school of fish. All the same, all going in the same direction, shimmering and moving as one. Every now and then one fish will go in the wrong direction because it's a bad fish, or a piece of seaweed. Anyway it's not a real valuable fish so you just drop it.

Real world data isn't like that. It's more like a cattle drive. It's wandering all over, trying to get away, breaking its leg, suddenly all showing up at the same time and trying to trample you, spooking at loud noises. It smells bad, some of it isn't going to fetch much when you get it to Chicago, it makes weird noises all the time, and frankly it's dumb as a sack of cheese.

But it's all data. Unlike the school of fish where the ringers are thrown in and can be disregarded, everything in the herd is pretty much a cow. You want to get them all to market. Only with great regret will you leave one behind... so you have to do the work of rounding up stragglers and checking brands.

0 comments:

Post a Comment