I investigated a while back the concept of a data obesity index (DOI) or scale, on par with the Body Mass Index (BMI) of people. I was curious if firms had more data ‘around their waist line’ than they required.
And very much like an athlete, a firm with very dense data, or well packed data, might have a higher data obesity rating but actually work with their data in line with a firm in the “normal” data range.
This is a good article talking about tools, Hadoop and ETL in this same framework of pushing large volumes of data into a working production environment. IBM focuses in their article on complexity of volumes of data which can equate to knowing where you SHOULD be storing data and where you ARE storing data.
When firms tell me how much data they have in house, they usually use petabytes (PB). Here is an example of a large scale telescope and an overview of its data management issue.
Where am I taking this? My data obesity scale will be a function of operational data, regulatory data (think good vs bad cholesterol ;-0 ) and data efficiency. I am working with firms to map their data lean layer (exoskeleton) and data fat layer, and define appropriate percentages of data mass per industry and per firm size.
Interested in working with me on this one? Tweet me @afairch on Twitter to start the conversation.