in the world, there are three kinds of lies, lies, damned lies, and statistics. Quoted Mark Twain. The authors of this article stands in the era of information explosion, shows the data from multiple angles of deceptive. Some are based on prejudice, some is the right way, but all in all, with data, may not have thought so simple and reliable.
this world constantly tells us that the data will tell you the truth. But the same data tend to tell us the story, different depends on what kind of data, and how you interpret. Two similar data, because people in different interpretation, thus showed two distinct conclusions, it makes me wonder what is the truth. Data is a tool for people’s hand, and we can according to our need to explain. To be clear, the problem is not to say we deliberately hide data for their own purposes, although people sometimes is likely to do so. I just want to emphasize that human could when interpreting data with unconscious bias.
in the era of big data, it is a huge problem. When you look at the different data to show you on the same issue when the situation is completely different, how do you find the answers to these questions?
no matter when, the data can be manipulated
Pam Baker’s book data Nostradamus: big data strategy, the author of the book, her to discuss this issue from the perspective of scientific data, but she still insists, you must first ask for questions, to get the right answer.
Baker explained to me in an E-mail: “data is based on its correlation with the precision problem was pulled. Algorithm including the problem clear way of input and answer as soon as possible.”
she said data scientists have many tools to do this job, but still have mistakes can happen. “Of course there’s always the possibility of mistakes, but data science and science as early as before the advent of big data has solved many problems. In fact, if the wrong data and algorithm used by the data points is flawed, the answer would be the wrong or defective.”
so far these are still valid, but we know the limitations of data scientists. So many companies, it seems I haven’t heard which express the meaning of this company, they are talking about the data, but most companies lack the experience to understand one thing: data can be manipulated, give you the answer you want.
earlier, I heard on the Gilbane convention in Boston said a lot of similar to the one speaker, scientists say that people don’t hold so much application, the average person only installed 10. He also said 90% of people don’t mind received spam messages. But know that he is a specifically for SMS advertising company provides solutions company. He Shared a lot of data, give you a lot of Suggestions were put forward, but if you really was based on the design scheme, for the application of you do, that is really silly lack.
the speaker shows a data, then the data tells us that there are 154000 applications have been downloaded every minute. But if everyone only less than 10 application, that how may remain at the same time to the rhythm and at the same time be download? When you can clear understanding of the contradiction between the data, you can understand, these data make problem is not so clear. Maybe that makes much more sense than we have imagined the old saying: “there are three kinds of lies: lies, damned lies, and statistics.”
master data is not difficult, the key is to learn the data analysis, judgement
and when we put the data into the hands of ordinary people, rather than to data scientists, like Baker suggested that the results might be very bad. Especially those trying to use the data to sell their products or services marketers. To make matters worse they might try to use the wrong information to depict their conclusions wonderful market.
Digital Clarity group President Scott Liewehr, said the situation is very dangerous. He told me that the market research must strive to establish effective investigation, otherwise, they may use the wrong data to the wrong conclusion waste of company resources. “For marketers, it is a big challenge, everyone may take the story of finding anything they want to say is said to have.” Liewehr told me. “If they don’t know how to do research analysis, there may cause a series of bad decisions.”
Baker also agree with the above. But she also said, merchants can also provide help for data sorting, because their markets than data researchers better understand the market dynamic, if we can combine the two together, then can produce better results. “Sometimes the market staff and sales staff more than data scientists know what to ask. That is why we need a data team is made up of different people.” She said.
but she also said that even in the safe and it is not always can get the correct information. “Sometimes enterprise users will struggle, only to draw the wrong conclusion, because they don’t understand statistical methods, and other necessary method to finish the work.
even if you very carefully, the data is always can let you to the correct conclusion
I made a report last week, is about the most popular enterprise synchronization and sharing tools, and the tool is based on the 541 Research studies. Now this is a very reputable company, they have been run before open research with two research more than one month. I don’t want to to evil results of their study, but in the report I doubt whether they ask or ask for the people to the problem. They should not simply look at general utilization rate, but should be carefully ask enterprise user license and the proportion of ordinary user license, if they do, will see a totally different conclusion? Realized that data is not so easy as you imagine, is not I said in this article research.
first of all, the data from 451 Research report found that more than 40% of the effective use Dropbox, the proportion is much higher than other companies, I reported the discovery when startled. Box is the cloud computing model of enterprise, was fourth in the market survey, about 15% of the respondents chose Box, but that’s not necessarily the whole picture of the whole story.
Ilya Fushman is Dropbox enterprise product manager, he told me last week, Dropbox has 100000 business users (both small enterprises, but also larger companies.) Considering that Dropbox is just open this product in April 2013, the number was very surprising. Interestingly, in comparison, Box told me that they has 39000 users, but not all the problems, because the Box has some very large users.
Box in the customer, for example, Eli Lilly, Toyota, dreamworks, comcast, MD, Andersen and glaxosmithkline giants, and recently to sell 300000 enterprises authorized to GE. If you put the Schneider Electric’s 65000, and 44000 licensing of procter & gamble, you can draw with 451 Research for enterprise users completely different conclusions, even if the enterprise’s total number is different.
according to the records, it is hard to find Dropbox has many users, because they don’t disclose such data, but their large enterprise customers also include many brand companies, such as Hearst, Hyatt, MIT, and news corp. And Dropbox will also some smaller companies logo on their website.
Alan Pelz is an analyst with the Research, 451, is also one of the authors of this study. He said that his team is still working on optimization method, and they are now published data is only a beginning of their long process of market research.
“I think the October survey data as we tell us some new facts – first of all, the company in the field of enterprise has a large number of fans (it doesn’t let anyone be surprised, especially their competitors). The market is still very immature, but in a growth phase, and now many companies are reluctant to put their data in the public cloud. These trends with the development of the time would be very interesting. And the new institute to mining information is who will be really grow out of value, and changing over time. And we are for this new field and new level segmentation market research and research of revenue model.” He wrote to my E-mail.
data does have great value, but even if you very carefully, but still possible because of the ambiguity and the trouble of data it is concluded that the wrong answer. Because even if we have all of the data, but still and reality will be a deviation. And you must ensure that your data is accurate for specific problems, and follow the best data induction. Even so, also could be completely unexpected results are obtained. Following data, it seems, the conclusion is not as easy as said.