Predicting Outliers

I have recently been completely immersed into trying to understand the position of outliers. Why are these anomalous data points considered as outliers in the study of normal distribution of data sets? The simplest definition of an outlier is stated as “an outlying observation that appears to deviate markedly from other members of the sample, in which it occurs”.

I believe one of the most important aspects or parameter missing in this simple definition is the “time” dimension. An outlier or an anomaly is only outside the norm, when understood relative to time. What may be an outlier now may become the norm just seconds later or may be a few light years from now. Identifying outliers is important and if one can could do that effectively, we could use them to get the bang for 20% top skim, rather than wasting critical resources in wooing the 80% that are the norm, but give only 50% payback.

Most intriguing part lies in the fact that more the data is skimmed, faster and easier it is to identify outliers, from where we can get 90% of the bang for 10% of the buck. The faster and better we analyze data; the ability to spot outliers increases equally fast, with interlinking patterns appearing in the data.

Building a data warehouse gives the ability to mine data, anticipate its unknown influences and links to use it intelligently and find ways to monetize these outliers.
However, the most significant aspect is the context of time. It’s very intriguing to see how time as a barometer of acceptance / rejection impacts outliers, thus influencing their gradual conversions……

Common outliers even exist in our society and we experience them so often. Not so far in time, it was taboo to openly discuss or talk about one’s sexual preference or orientation. Such people were outliers 20-30 years ago. Since then, the acceptance and understanding of these outliers has changed (relative to time, as we say), today we live in a more tolerant and accepting society, that no longer considers them as outliers. Voices to support their rights and privileges are growing louder, with campaigns driven to give them freedom and ‘at par statuses’ across the globe.

On the other hand, sometimes society creates outliers, for religious, social and economic reasons, like the latest murmurings of certain orthodox religions that prohibit women from entering places of worship. One wonders, if this developing discrimination will create another set of outliers, for the next 10 or 20 years?

Time in both these instances has, and will have a profound influence on outliers.

Similarly in business world, our ability to cash in on outliers, is extremely time sensitive – the outliers of yesterday have become biggest customers today, who with time will be replaced by more outliers as customers of tomorrow.

Predicting outliers & assessing their interdependence, is a science that many Wall Street stalwarts have known and practiced effectively over time, betting on industries that appear to be on the fringe today (outliers) to rake in the moolah when they become successful some years later.

Another case in point is Red Box DVD. It started as a general convenience kiosk in 2000 that collected data on its usage and saw DVD rentals as their outlier. In 2002 Red Box decided to change, rebrand and reposition itself, as only a DVD rental kiosk for a very defined customer – the live-in couples who wanted to catch a weekday movie at night, in the comfort of their home – ordering dinner from a local Chinese and a movie for just a dollar, all this at less than half the price of an outdoor dinner and a movie night !

In predicting and leveraging ones outliers, lies the key to unlock the unlimited potential that may not necessarily be timeless too!!
Businesses that don’t see outliers in their customer data die very early on.