Statistics and Data Science
Why do we need to have sufficient knowledge of Statistics? What concepts of statistics are useful in Data Science?
The aim of data science is to analyze unstructured data and information being extracted from different sources and applying scientific disciplines to transform them into a meaningful insight. Hence, this is where the principles of Statistics comes in.
Statistics play’s a fundamental part in data science especially in providing tools and techniques such as hypothesis testing, regressions, probability distributions, correlations and so on, to find the structure of a large complex data or notably known as “big data”.
Why do we need to explore data? Why do we need to clean the data?
Data cleaning and exploration are one of the most crucial and tedious parts of data preparation. In order to produce a “quality” output, 50% to 80% of the time are spend on cleaning and organizing the data before the actual analysis according to Steve Lohr of The New York Times. It is extremely important for a data scientist to refine the unstructured datasets into a usable dataset that can make better analytics and insights for decision making.