While the new suitcases will serve our needs for most travel plans, we know we'll have to make accommodations for longer trips, such as a planned two week European junket in the spring. The thinking is that we'll either purchase or borrow additional luggage for the longer trips rather than “solve” the suitcase problem once and for all with bags larger than we generally need. In the end, we decided to optimize for 90% of travel at the expense of the extreme 10%.
The suitcase purchase metaphor extrapolated to several December work-related situations. The first was my annual get together with a stats grad school friend. Every year it seems, we “discuss” the relative merits of his statistical software choice, SAS, versus mine, either R or Python. And every year his opening salvo is that my open source solutions are limited in the size of data sets they can process by the available memory on my machine, while his choice exploits virtual memory to theoretically deal with much larger data.
I counter that my 64 GB notebook readily accommodates 90% of my statistical work, recently handling a 50M+ record, 70 attribute data set in R with aplomb. Moreover I argue that I have in my pocket postgres and MonetDB databases to serve R/Python for larger than memory data. Finally, I note that I've readily deployed Spark in the cloud for terabytes of data – and by the way my options are much cheaper than his. Incidentally, what's the largest data set he statistically analyzed in 2016 ? A measly 2 GB!. Alas, I don't think my friend was persuaded by my 90% logic.
The second illustration comes from a customer with whom my company, Inquidia Consulting, just completed a brief planning engagement. In business for just three years, this company's products are data and analytics. They turned to Inquidia to help develop a product roadmap for the next 5-7 years.
I'm convinced that some companies not so secretly wish to be big data, even when they're not. The consulting customer CEO touted a willingness to build a beefy hadoop cluster infrastructure early on, but while Inquidia likes nothing better than implementing hadoop, our requirements-gathering findings generously estimated no more than 10 TB of enterprise data after five years. Not exactly big data – and certainly not a requirement warranting an immediate solution driven by 10% planning.
The good news is that a combination of on-premise and cloud-based infrastructures can be used to manage the risks of a 90-10 solution. The customer's once daily computation needs would be very demanding, but easily satisfied by a use-when-needed cloud-based cluster.
To address the needs of modest data with regular but short bursts of intense computation, Inquidia proposed an architecture that leveraged on-premise open source database storage and cloud-based computation immediately, evolving to cloud-based analytic databases and computation in the intermediate term. Given the company's experience with open source relational databases and its modest cash situation, we felt this solution made the most sense. In the end, the customer did too.
A final illustration reflects personal dissonance of whether or not to purchase a bulky digital SLR camera to meet the 10% telephoto needs my smartphone cannot accommodate. Happy as I am with the quality of the smartphone camera, I still long for an answer to zoom opportunities. Maybe I'll kludge a solution where I purchase the DSLR, but use the smartphone camera for 90% of my shots.