Statistics for Big Data For Dummies, Wiley

Statistics for Big Data For Dummies
Statistics for Big Data For Dummies

About This Book
This book is intended as an overview of the field of big data, with a focus on the
statistical methods used. It also provides a look at several key applications of big data.
Big data is a broad topic; it includes quantitative subjects such as math, statistics,
computer science, and data science. Big data also covers many applications, such as
weather forecasting, financial modeling, political polling methods, and so forth.

Our intentions for this book specifically include the following:
  • Provide an overview of the field of big data.
  • Introduce many useful applications of big data.
  • Show how data may be organized and checked for bad or missing information.
  • Show how to handle outliers in a dataset.
  • Explain how to identify assumptions that are made when analyzing data.
  • Provide a detailed explanation of how data may be analyzed with graphical techniques.
  • Cover several key univariate (involving only one variable) statistical techniques for analyzing data.
  • Explain widely used multivariate (involving more than one variable) statistical techniques.
  • Provide an overview of modeling techniques such as regression analysis.
  • Explain the techniques that are commonly used to analyze time series data.
  • Cover techniques used to forecast the future values of a dataset.
  • Provide a brief overview of software packages and how they can be used to analyze statistical data.
Because this is a For Dummies book, the chapters are written so you can pick and
choose whichever topics that interest you the most and dive right in. There’s no need to
read the chapters in sequential order, although you certainly could. We do suggest,
though, that you make sure you’re comfortable with the ideas developed in Chapters 4
and 5 before proceeding to the later chapters in the book. Each chapter also contains
several tips, reminders, and other tidbits, and in several cases there are links to websites
you can use to further pursue the subject. There’s also an online Cheat Sheet that
includes a summary of key equations for ease of reference.
As mentioned, this is a big topic and a fairly new field. Space constraints make
possible only an introduction to the statistical concepts that underlie big data. But we
hope it is enough to get you started in the right direction.

Statistics For Big Data For Dummies
Visit
view this book’s cheat sheet.
+++++++++
Table of Contents
Cover
Introduction
About This Book
Foolish Assumptions
Icons Used in This Book
Beyond the Book
Where to Go From Here
Part I: Introducing Big Data Statistics
Chapter 1: What Is Big Data and What Do You Do with It?
Characteristics of Big Data
Exploratory Data Analysis (EDA)
Statistical Analysis of Big Data
Chapter 2: Characteristics of Big Data: The Three Vs
Characteristics of Big Data
Traditional Database Management Systems (DBMS)
Chapter 3: Using Big Data: The Hot Applications
Big Data and Weather Forecasting
Big Data and Healthcare Services
Big Data and Insurance
Big Data and Finance
Big Data and Electric Utilities
Big Data and Higher Education
Big Data and Retailers
Big Data and Search Engines
Big Data and Social Media
Chapter 4: Understanding Probabilities
The Core Structure: Probability Spaces
Discrete Probability Distributions
Continuous Probability Distributions
Introducing Multivariate Probability Distributions
Chapter 5: Basic Statistical Ideas
Some Preliminaries Regarding Data
Summary Statistical Measures
Overview of Hypothesis Testing
Higher-Order Measures
Part II: Preparing and Cleaning Data
Chapter 6: Dirty Work: Preparing Your Data for Analysis
Passing the Eye Test: Does Your Data Look Correct?
Being Careful with Dates
Does the Data Make Sense?
Frequently Encountered Data Headaches
Other Common Data Transformations
Chapter 7: Figuring the Format: Important Computer File
Formats
Spreadsheet Formats
Database Formats
Chapter 8: Checking Assumptions: Testing for Normality
Goodness of fit test
Jarque-Bera test
Chapter 9: Dealing with Missing or Incomplete Data
Missing Data: What’s the Problem?
Techniques for Dealing with Missing Data
Chapter 10: Sending Out a Posse: Searching for Outliers
Testing for Outliers
Robust Statistics
Dealing with Outliers
Part III: Exploratory Data Analysis (EDA)
Chapter 11: An Overview of Exploratory Data Analysis (EDA)
Graphical EDA Techniques
EDA Techniques for Testing Assumptions
Quantitative EDA Techniques
Chapter 12: A Plot to Get Graphical: Graphical Techniques
Stem-and-Leaf Plots
Scatter Plots
Box Plots
Histograms
Quantile-Quantile (QQ) Plots
Autocorrelation Plots
Chapter 13: You’re the Only Variable for Me: Univariate
Statistical Techniques
Counting Events Over a Time Interval: The Poisson Distribution
Continuous Probability Distributions
Chapter 14: To All the Variables We’ve Encountered:
Multivariate Statistical Techniques
Testing Hypotheses about Two Population Means
Using Analysis of Variance (ANOVA) to Test Hypotheses about Population Means
The F-Distribution
F-Test for the Equality of Two Population Variances
Correlation
Chapter 15: Regression Analysis
The Fundamental Assumption: Variables Have a Linear Relationship
Defining the Population Regression Equation
Estimating the Population Regression Equation
Testing the Estimated Regression Equation
Using Statistical Software
Assumptions of Simple Linear Regression
Multiple Regression Analysis
Multicollinearity
Chapter 16: When You’ve Got the Time: Time Series Analysis
Key Properties of a Time Series
Forecasting with Decomposition Methods
Smoothing Techniques
Seasonal Components
Modeling a Time Series with Regression Analysis
Comparing Different Models: MAD and MSE
Part IV: Big Data Applications
Chapter 17: Using Your Crystal Ball: Forecasting with Big Data
ARIMA Modeling
Simulation Techniques
Chapter 18: Crunching Numbers: Performing Statistical Analysis
on Your Computer
Excelling at Excel
Programming with Visual Basic for Applications (VBA)
R, Matey!
Chapter 19: Seeking Free Sources of Financial Data
Yahoo! Finance
Federal Reserve Economic Data (FRED)
Board of Governors of the Federal Reserve System
U.S. Department of the Treasury
Other Useful Financial Websites
Part V: The Part of Tens
Chapter 20: Ten (or So) Best Practices in Data Preparation
Check Data Formats
Verify Data Types
Graph Your Data
Verify Data Accuracy
Identify Outliers
Deal with Missing Values
Check Your Assumptions about How the Data Is Distributed
Back Up and Document Everything You Do
Chapter 21: Ten (or So) Questions Answered by Exploratory
Data Analysis (EDA)
What Are the Key Properties of a Dataset?
What’s the Center of the Data?
How Much Spread Is There in the Data?
Is the Data Skewed?
What Distribution Does the Data Follow?
Are the Elements in the Dataset Uncorrelated?
Does the Center of the Dataset Change Over Time?
Does the Spread of the Dataset Change Over Time?
Are There Outliers in the Data?
Does the Data Conform to Our Assumptions?
About the Authors
Cheat Sheet
Advertisement Page
Connect with Dummies
End User License Agreement

Introduction
 Welcome to Statistics For Big Data For Dummies! Every day, what has come to be
known as big data is making its influence felt in our lives. Some of the most useful
innovations of the past 20 years have been made possible by the advent of massive
data-gathering capabilities combined with rapidly improving computer technology.

For example, of course, we have become accustomed to finding almost any information
we need through the Internet. You can locate nearly anything under the sun
immediately by using a search engine such as Google or DuckDuckGo. Finding
information this way has become so commonplace that Google has slowly become a
verb, as in “I don’t know where to find that restaurant — I’ll just Google it.” Just think
how much more efficient our lives have become as a result of search engines. But how
does Google work? Google couldn’t exist without the ability to process massive
quantities of information at an extremely rapid speed, and its software has to be
extremely efficient.

Another area that has changed our lives forever is e-commerce, of which the classic
example is Amazon.com. People can buy virtually every product they use in their daily
lives online (and have it delivered promptly, too). Often online prices are lower than in
traditional “brick-and-mortar” stores, and the range of choices is wider. Online
shopping also lets people find the best available items at the lowest possible prices.

Another huge advantage to online shopping is the ability of the sellers to provide
reviews of products and recommendations for future purchases. Reviews from other
shoppers can give extremely important information that isn’t available from a simple
product description provided by manufacturers. And recommendations for future
purchases are a great way for consumers to find new products that they might not
otherwise have known about. Recommendations are enabled by one application of big
data — the use of highly sophisticated programs that analyze shopping data and
identify items that tend to be purchased by the same consumers.

Although online shopping is now second nature for many consumers, the reality is that
e-commerce has only come into its own in the last 15–20 years, largely thanks to the
rise of big data. A website such as Amazon.com must process quantities of information
that would have been unthinkably gigantic just a few years ago, and that processing
must be done quickly and efficiently. Thanks to rapidly improving technology, many
traditional retailers now also offer the option of making purchases online; failure to do
so would put a retailer at a huge competitive disadvantage.

In addition to search engines and e-commerce, big data is making a major impact in a
surprising number of other areas that affect our daily lives:

  • Social media
  • Online auction sites
  • Insurance
  • Healthcare
  • Energy
  • Political polling
  • Weather forecasting
  • Education
  • Travel
  • Finance
 Screenshot 

Statistics for Big Data For Dummies, Wiley

Purchase Now !
Just with Paypal



Product details
 Price
 File Size
 7,462 KB
 Pages
 412 p
 File Type
 PDF format
 ISBN
 978-1-118-94003-7
 Copyright
 2015 by John Wiley & Sons, Inc  
●▬▬▬▬▬❂❂❂▬▬▬▬▬●
●▬▬❂❂▬▬●
●▬❂▬●


═════ ═════

Previous Post Next Post