Category: Data

Big data to death: the data of addiction can’t let we correctly grasp the future

in the world, there are three kinds of lies, lies, damned lies, and statistics. Quoted Mark Twain. The authors of this article stands in the era of information explosion, shows the data from multiple angles of deceptive. Some are based on prejudice, some is the right way, but all in all, with data, may not have thought so simple and reliable.

this world constantly tells us that the data will tell you the truth. But the same data tend to tell us the story, different depends on what kind of data, and how you interpret. Two similar data, because people in different interpretation, thus showed two distinct conclusions, it makes me wonder what is the truth. Data is a tool for people’s hand, and we can according to our need to explain. To be clear, the problem is not to say we deliberately hide data for their own purposes, although people sometimes is likely to do so. I just want to emphasize that human could when interpreting data with unconscious bias.

in the era of big data, it is a huge problem. When you look at the different data to show you on the same issue when the situation is completely different, how do you find the answers to these questions?

no matter when, the data can be manipulated

Pam Baker’s book data Nostradamus: big data strategy, the author of the book, her to discuss this issue from the perspective of scientific data, but she still insists, you must first ask for questions, to get the right answer.

Baker explained to me in an E-mail: “data is based on its correlation with the precision problem was pulled. Algorithm including the problem clear way of input and answer as soon as possible.”

she said data scientists have many tools to do this job, but still have mistakes can happen. “Of course there’s always the possibility of mistakes, but data science and science as early as before the advent of big data has solved many problems. In fact, if the wrong data and algorithm used by the data points is flawed, the answer would be the wrong or defective.”

so far these are still valid, but we know the limitations of data scientists. So many companies, it seems I haven’t heard which express the meaning of this company, they are talking about the data, but most companies lack the experience to understand one thing: data can be manipulated, give you the answer you want.

earlier, I heard on the Gilbane convention in Boston said a lot of similar to the one speaker, scientists say that people don’t hold so much application, the average person only installed 10. He also said 90% of people don’t mind received spam messages. But know that he is a specifically for SMS advertising company provides solutions company. He Shared a lot of data, give you a lot of Suggestions were put forward, but if you really was based on the design scheme, for the application of you do, that is really silly lack.

the speaker shows a data, then the data tells us that there are 154000 applications have been downloaded every minute. But if everyone only less than 10 application, that how may remain at the same time to the rhythm and at the same time be download? When you can clear understanding of the contradiction between the data, you can understand, these data make problem is not so clear. Maybe that makes much more sense than we have imagined the old saying: “there are three kinds of lies: lies, damned lies, and statistics.”

master data is not difficult, the key is to learn the data analysis, judgement

and when we put the data into the hands of ordinary people, rather than to data scientists, like Baker suggested that the results might be very bad. Especially those trying to use the data to sell their products or services marketers. To make matters worse they might try to use the wrong information to depict their conclusions wonderful market.

Digital Clarity group President Scott Liewehr, said the situation is very dangerous. He told me that the market research must strive to establish effective investigation, otherwise, they may use the wrong data to the wrong conclusion waste of company resources. “For marketers, it is a big challenge, everyone may take the story of finding anything they want to say is said to have.” Liewehr told me. “If they don’t know how to do research analysis, there may cause a series of bad decisions.”

Baker also agree with the above. But she also said, merchants can also provide help for data sorting, because their markets than data researchers better understand the market dynamic, if we can combine the two together, then can produce better results. “Sometimes the market staff and sales staff more than data scientists know what to ask. That is why we need a data team is made up of different people.” She said.

but she also said that even in the safe and it is not always can get the correct information. “Sometimes enterprise users will struggle, only to draw the wrong conclusion, because they don’t understand statistical methods, and other necessary method to finish the work.

even if you very carefully, the data is always can let you to the correct conclusion

I made a report last week, is about the most popular enterprise synchronization and sharing tools, and the tool is based on the 541 Research studies. Now this is a very reputable company, they have been run before open research with two research more than one month. I don’t want to to evil results of their study, but in the report I doubt whether they ask or ask for the people to the problem. They should not simply look at general utilization rate, but should be carefully ask enterprise user license and the proportion of ordinary user license, if they do, will see a totally different conclusion? Realized that data is not so easy as you imagine, is not I said in this article research.

first of all, the data from 451 Research report found that more than 40% of the effective use Dropbox, the proportion is much higher than other companies, I reported the discovery when startled. Box is the cloud computing model of enterprise, was fourth in the market survey, about 15% of the respondents chose Box, but that’s not necessarily the whole picture of the whole story.

Ilya Fushman is Dropbox enterprise product manager, he told me last week, Dropbox has 100000 business users (both small enterprises, but also larger companies.) Considering that Dropbox is just open this product in April 2013, the number was very surprising. Interestingly, in comparison, Box told me that they has 39000 users, but not all the problems, because the Box has some very large users.

Box in the customer, for example, Eli Lilly, Toyota, dreamworks, comcast, MD, Andersen and glaxosmithkline giants, and recently to sell 300000 enterprises authorized to GE. If you put the Schneider Electric’s 65000, and 44000 licensing of procter & gamble, you can draw with 451 Research for enterprise users completely different conclusions, even if the enterprise’s total number is different.

according to the records, it is hard to find Dropbox has many users, because they don’t disclose such data, but their large enterprise customers also include many brand companies, such as Hearst, Hyatt, MIT, and news corp. And Dropbox will also some smaller companies logo on their website.

Alan Pelz is an analyst with the Research, 451, is also one of the authors of this study. He said that his team is still working on optimization method, and they are now published data is only a beginning of their long process of market research.

“I think the October survey data as we tell us some new facts – first of all, the company in the field of enterprise has a large number of fans (it doesn’t let anyone be surprised, especially their competitors). The market is still very immature, but in a growth phase, and now many companies are reluctant to put their data in the public cloud. These trends with the development of the time would be very interesting. And the new institute to mining information is who will be really grow out of value, and changing over time. And we are for this new field and new level segmentation market research and research of revenue model.” He wrote to my E-mail.

data does have great value, but even if you very carefully, but still possible because of the ambiguity and the trouble of data it is concluded that the wrong answer. Because even if we have all of the data, but still and reality will be a deviation. And you must ensure that your data is accurate for specific problems, and follow the best data induction. Even so, also could be completely unexpected results are obtained. Following data, it seems, the conclusion is not as easy as said.



Intel Basis Peak: the focus movement data “non-professional” bracelet

the Basis B1 is tracking wristbands launched in 2013, the amazing but not perfect. And Basis Peak has done a great enhancements based on Basis B1, whether in the design, according to data, ease of use is better in B1.

the most important of all, the new Basis Peak ($200) has a more precise heart rate monitoring function. Last week, I tested on the Peak, although its heart rate monitoring function is not perfect, but is very close to the Basis of the effect of propaganda. This is quite important, because want to know what the Basis is the highest form of so-called represents the heart rate sensor, and many activities tracking wristbands for the accuracy of the heart rate sensor problem all saves the function.

a real-time heart rate monitoring to

the Basis of success has always been dependent on science, clean, reliable and irrefutable science. Jawbone and Fitbit wristbands are emphasized how easy it is to use the chart, beautiful and comfortable, and the Basis of wrist strap is focus on the battery sensor, the pursuit of the accuracy of the data.

but the problem is, B1 and cannot meet the requirements of all. Despite claims that there are so many sensor (acceleration sensor, optical, skin temperature, heart rate monitor sensor and skin conductance response (sensor), but B1 never generate real-time continuous heart rate data. It behaved effect feels like a chest strap monitor. Instead, B1 heart rate data is used to calculate calories burned and sleep quality the Basis of the algorithm.

you may be the spot check of heart rate continuously throughout the day, but on the Basis of a generation of the heart rate sensor does not match any of the equipment performance.

why? Because the sensor is not lock the user’s heart rate, and track movement condition, is not accurate when tracking cross-country race. But on the map the user movement level is still very useful. And when you sleep or not movement, the sensor will conduct further analysis of sleep. Even so, but the old heart rate monitor still comply with the technical regulations in 2013.

new technology, new possible

but now, the Basis Peak can be basically the requirements. According to the Basis, spectral sensor higher brightness of LED can reduce channel interference noise (such as light pollution). The sensor and a modified photoreceptor. This part is through the absorption of leds to capture the trace of blood flow, so as to measure the heart rate.

the Basis Peak chassis and new sensors. It is located in a prominent platform, just like a sealing gasket can be a strong connection with your skin. Peak than B1 less, so when you exercise won’t connect from the skin. And the flexibility of the Peak silica gel straps can be sensor cling to your skin.

here, you can see what I mean: want to let the Basis to realize real time continuous heart rate tracking, it also needs a platform that can track the blood flow through the thickness and pressure low pressure. The new Peak has these functions? This is what I want to answer. My heart rate control equipment is LG headset suite, the suite in the chest with A heart rate monitor in the A/B testing accuracy is high.

LG headphones with my chest with monitors that is a perfect match. When you are in motion, can lock on heart function, also increase or decrease of sports consumption. This is a very important point: with the increase of consumption, LG earpieces faster in the form of a uniform scale shows the heart rate. When consuming less, digital will also decrease evenly. Although these LG headphones you never hospital dedicated the accuracy of the electrocardiogram (ecg), but they are more portable, especially in sports.

almost perfect real-time tracking

the Basis Peak of headphones smartphone applications make digital reports had a very good response. Most of the time, the Peak response headphones 2 to 3 times per minute, and when I was in a fast moving test, I found that the combination of the two equipment level is much better than I imagined, in the condition of rapid will still be able to record the test results.

after ten minutes’ walk, Peak shows 133 BPM (pulse per minute), and headset display 131 BPM. After twenty minutes’ walk, Peak shows 141 BPM, headset 142 BPM. So far, everything is all right. But when I close to top speed, Peak showed some incredible readings. Although during this period, the reading of the headphones has grown steadily, but shows 154 BPM is higher than that of Peak 145 BPM.

a few seconds later, the problem is more serious. After I continue to accelerate the speed, headphones according to 165 BPM, and the Peak is only shows 129 BPM. But it only lasted a few minutes. Later, when I walk down the mountain, the Peak number, according to 162 BPM (headset display 165 BPM) at this time. In the process of the mountain, the Peak and headphones are returned to the normal level of BPM.

Basis Peak against Microsoft Band

in fact, this is my second time test. In two days ago for the first time test, showed similar performance. In the first test, the Peak shows unusual number of appeared twice, the second in the only one. In the event of a failure, after wristbands self correcting and return to normal.

although see any anomalies can be classified as exciting things, but at least relative to the Microsoft Band (another claim to proceed in a real-time uninterrupted wearable heart monitoring device), the Peak has higher accuracy obviously. When I climb the mountain sports. They are both devices were tested. Microsoft Band in the number of heart rate monitoring fluctuations are more frequent (appeared when the second 11 times), heart rate data deviation is greater (when the equipment is the Basis and LG, respectively, 156 and 157 BPM, Microsoft Band shows only 109 BPM). It is important to note that the Basis that Peak heart rate sampling 32 times per second, Microsoft claimed that heart rate sampling once per second.

but they did not show that the Peak is an excellent product. Failure in addition to the above, the Peak is not report any heart rate data. In the first test I didn’t pay too much attention to the fault, but it appears a barrage of. Of course, if the device does not stay locked, then give users a false data is a matter of fate. However, if you need the bracelets for feedback, still can use these data.

I also tested Peak in elliptical machine. There are some unsatisfactory in terms of simulation run uphill, but I make sure the machine arm movement can let Peak by a constant force. Is keep Peak continuous high intensity exercise next minute. In the first 2 minutes, the report data is lower than the headphones (130 BPM to 146 BPM). But after 2 minutes, has reached the limits of 162 BPM, Peak always don’t depart from the headset data of more than 4 BPM.

however, in the same test conditions, Microsoft Band data has been below the normal level. For example, when the equipment is the Basis and LG also reported 156 BPM, Microsoft Band only reported 115 BPM.

the best sleep tracking

I’m a little too focused on Peak heart rate monitoring function, but for most users, the wrist strap still have some of the more important function. Obviously, the Peak steps can record, this is all the standard function of such equipment. Peak meter step function while there are some error, but is manageable. For example, in the first time I test, my Jawbone UP24 reported 4702 steps, while Peak for 4689 steps. For five days of testing, the data error does not appear obvious fluctuation.

although steps tracking function is good, but I’m more interested in to sleep report. This is not so much my instincts drive, as this is the Basis of the technology driven me. The best thing is that you do not touch any button or interface can begin to record a campaign. Peak used a Basis is called a Body IQ technology, when you started walking, running or biking outside, it can automatically determine. In the same way, it can also use the Body to start or finish IQ log cycle of sleep.

Illumio, enterprise cloud data of the firewall

throughout so not calm after 22 months, cloud security service startup Illumio plan no longer surreptitiously make research and development, and in the last week to explain how it works.

Illumio CEO Andrew Rubin said: “Illumio protection for data center and cloud computing infrastructure not only provides the cloud with the modern firewalls, and much more.”

Illumio security technology is a kind of to the enterprise cloud workspace to access management tool. General security service companies such as Conjur provide very cloudy computing infrastructure for the company to manage the access, so you can ensure that a particular talent into a specific server or directory, and Illumio would apply this to every single workspace, ensure that a specific workspace is sent to a specific server.

Rubin explained: “in both public and private cloud or bare-metal, users can be called a virtual execution node software agents to their server operating system, and the content includes all work area agent can be transferred between the server.”

these agents are protected by agreement calculation engine, users can determine how the workspace of the content delivery. If users need more servers, calculation engine will see a circumstance to the workspace content distribution new server.

“we can build the whole environment and communication between the parts of the image.” Rubin said, “you put the computing platform agreement provided to each individual workspace content.”

Rubin says the workspace content security system can ensure the application to run in the developer environment will not be sent to the product on the server, is paranoid for cloud storage companies can be within their own environment security agreement, this agreement can be applied to the public cloud computing infrastructure.

Illumio method of research and development of science and technology including modern safety problem to communicate with other companies, the company found the problem when Illumio even no core products can solve these problems.

“although we have been in secret, but in fact we communicate with more than one hundreds of companies have.” Illumio’s chief commercial officer Alan Cohen said.

Illumio provide service for Morgan Stanley, Plantronics and yahoo, and already from Andreessen Horowitz, General Catalyst, 8, the Data Formation of Collective, Marc Benioff and Jerry Yang won a $42.5 million investment.



Big data to death, you hard to imagine the final winner

are you still don’t understand Hive, Spark, Pig of these programming languages and scratching their heads? Don’t worry, a competition is making complex big data technologies such as Hadoop can more easily used by nonprofessional users, you can also enjoy extra benefits it makes you rich.

yes, is you.

a few years ago, Cowen& Co., a former analyst Peter Goldmacher said in a research briefing, “after all, the more you close to the big data technology end users, the greater your reward.” He thinks, in a world of big data, the biggest winner is not the supplier of this technology, but those who will use it to create new industries or collapse of traditional business company.

over time from day to day, Goldmacher in 2012 to forecast is more and more correct. The builders of the large data base should be praised, but are those who profit the most closest relationship with the technical marketing and sales experts, and these people probably don’t know how to from a pivot table for parallel computing.

provides solutions rather than technology

we have seen this kind of practice in some companies, such as John Deere, they use Hadoop and no database technology developed a very powerful data oriented applications. When silicon valley also himself as the center of the universe, and that the wider world outside is used his big data in the most useful place.

we did not be surprised if this is not the case. As Goldmacher written, it is always suitable for science and technology: as said before, if we look back at the history of enterprise resource planning, more than two hundreds of the company was founded, in the standard business process automation in the process of capital accumulation. This means that investors in 1990, less than 0.5% of the possibility to choose SAP or former as the final winner. However, if the investors in the 1990 years for the Dow’s 30 began to shares in the company, the enterprise resource planning that he can reduce 35% of the general cost and management cost, and five times the increase in the income, through large-scale automated production value of nearly eight times will also increase.

of course, big data infrastructure service providers will also reap, such as Cloudera. Cloudera’s market value has reached billions of dollars, some other companies, such as DataStax and mongo, market has more than $1.

but most to gain from the company’s software is not their own, for the following reasons:

most of the big data technology is an open resource, this means that everyone can use it, it is difficult to profit.

the principal users of technology is in the development of the company such as Hadoop, these companies to promote technology use is very important, but they are not willing to spend money.

closer relation with consumer companies and capital relatively plenty of companies are more likely to use big data to make a profit.

according to the first reason, co-founder of Cloudera Mike Olson thinks, “you can’t rely on closed resource platform to be successful, you can’t just open resources to build a successful company.” This makes suppliers combine ownership and open source licensing, to maximize the benefits, but those at the top of the industry the company do not have to worry about this situation.

the winner is…

it is obvious that they are application (specific) service providers, they do not show the end user technology complexity, for they provide service charge. Co-founder of Workday Aneel Bhusri began several years ago the idea.

McKinsey & amp; Co., details the big data on the influence of different industries:

these companies including I mentioned earlier, John Deeres, but on the skills of the more mainstream, who will win?

the answer is the most hides product complexity, can let users easy operation of the company.

, for example, Microsoft fits this pattern. What did he do to Azure of machine learning. Azure machine learning is expected to eliminate almost all “in first cost related to production, development and extension of machine learning methods part”, and “visual workflow and pioneering template can make general machine learning task simpler”.

although there are many places can be picky Microsoft (I often to find faults with it), but it is the difficulty in reducing complex calculation of this aspect to do more than any other company. Windows, Visual Studio, there are many other technology make it possible to mainstream system administrators and developers creativity, Azure machine learning to follow the science and technology.

geek disappeared!

however, we need to further consideration. After all, although big data for developers and system administrators have been good, but the real problem is that to let the data easier to use to ordinary people like you and me, Wikibon analyst Dave Vellante had the idea of the following:

business intelligence has created a analyst, but it has not become the mainstream. We hope that big data can become mainstream.

there was a look is very suitable for the company to do this is Adobe. Adobe has very creative career, a few years ago for Omniture takeover makes Adobe steadily across into the big data world, but it is more focused on help marketing expert for potential customers.

the key to managing big data is not huge amount of data, more is about the data source and data types. For a company like Adobe, in order to make marketing experts according to the advertisement, charts, etc in a very short time to make a decision, it should be gathered and analyzed from social media, cash receipts, and so on all kinds of information to understand customer behavior.

the weeds the

Microsoft and Adobe is just big data may be the winner of the two examples, of course, there are plenty of other companies could stand out, hope here with your company.

in order to achieve this goal, we need to stop the useless things in big data technology study, instead to focus on them can create business value. This value can be transmitted by the application we use, will not disappear.

Olson told Dirk has Slama of Bosch in an interview, he and a lot of people who just big data as talked, he thought, “these people work is not the ideal partner, because they fundamentally not business problem oriented”. The real winners are those who focus on big data era people solve real business problems.