Data has been compared to oil, but perhaps salt is a more apt analogy.

For most of our history salt has been in short supply. It was so precious in fact that Roman soldiers were partially paid in salt – a fact that explains the term ‘salary’. However, those restrictions have disappeared in the last few years and we now consume so much that it threatens our health.

Like salt, data: once a rare and precious marketing commodity; now one in over-supply.

But is it fair to compare a data surfeit with the damaging effects of excessive salt? A study by Paul Slovic, currently professor of psychology at the University of Oregon, suggests so.

He ran an ingenious experiment with professional horse-racing handicap setters investigating the phenomenon. The handicappers were given 88 variables that were useful in predicting a horse's performance. They then predicted the outcome of races using either 5, 10, 20, 30 or 40 of the variables.

The results were illuminating. Accuracy was the same regardless of the number of variables used. However, over-confidence grew as more data was harnessed. Experts over-estimated the importance of factors that had a limited value. It was only when five data points were used that accuracy and confidence were well calibrated.

If this were a rogue finding then it wouldn't be a concern. However, Slovic's results have been repeatedly validated. Stuart Oskamp, psychology professor at Claremont Graduate University, undertook a similar experiment with thirty-two clinical psychologists. They were given a case history split into four sections, each one dealing with a successive chronological period of a patient's life.

After the psychologists read each section they answered twenty-five questions, to which the answers were already known, about the patient. As with Slovic's experiment increased information led to a significant increase in confidence but a negligible increase in accuracy.

These experiments suggest that an unbounded enthusiasm for data is dangerous and that advertisers should avoid harnessing data merely because it exists. Instead, as much time, energy and effort should be expended in choosing which data sets to ignore as which to use.

Advertisers who resist this painful cull, and gorge on data, might end up feeling rather sick.

