Practical Machine Learning with Python

A Problem-Solver’s Guide to Building Real-World Intelligent Systems

Dipanjan Sarkar . Raghav Bali . Tushar Sharma


e-books shop
e-books shop
Purchase Now !
Just with Paypal



Book Details
 Price
 3.00
 Pages
 545 p
 File Size 
 19,858 KB
 File Type
 PDF format
 ISBN-13
 978-1-4842-3206-4 (pbk)
 978-1-4842-3207-1 (electronic) 
 Copyright©   
 2018 by Dipanjan Sarkar,
 Raghav Bali and Tushar Sharma 

About the Authors
Dipanjan Sarkar is a data scientist at Intel, on a mission to make the
world more connected and productive. He primarily works on Data
Science, analytics, business intelligence, application development, and
building large-scale intelligent systems. He holds a master of technology
degree in Information Technology with specializations in Data Science
and Software Engineering from the International Institute of Information
Technology, Bangalore. He is also an avid supporter of self-learning,
especially Massive Open Online Courses and also holds a Data Science
Specialization from Johns Hopkins University on Coursera.
Dipanjan has been an analytics practitioner for several years,
specializing in statistical, predictive, and text analytics. Having a
passion for Data Science and education, he is a Data Science Mentor
at Springboard, helping people up-skill on areas like Data Science and
Machine Learning. Dipanjan has also authored several books on R,
Python, Machine Learning, and analytics, including Text Analytics with
Python, Apress 2016. Besides this, he occasionally reviews technical books
and acts as a course beta tester for Coursera. Dipanjan’s interests include learning about new technology, financial markets, disruptive start-ups, Data Science, and more recently, artificial intelligence and Deep Learning.
Raghav Bali is a data scientist at Intel, enabling proactive and data-driven
IT initiatives. He primarily works on Data Science, analytics, business
intelligence, and development of scalable Machine Learning-based
solutions. He has also worked in domains such as ERP and finance with
some of the leading organizations in the world. Raghav has a master’s
degree (gold medalist) in Information Technology from International
Institute of Information Technology, Bangalore.
Raghav is a technology enthusiast who loves reading and playing
around with new gadgets and technologies. He has also authored
several books on R, Machine Learning, and Analytics. He is a shutterbug,
capturing moments when he isn’t busy solving problems.
Tushar Sharma has a master’s degree from International Institute of
Information Technology, Bangalore. He works as a Data Scientist with
Intel. His work involves developing analytical solutions at scale using
enormous volumes of infrastructure data. In his previous role, he worked
in the financial domain developing scalable Machine Learning solutions
for major financial organizations. He is proficient in Python, R, and Big
Data frameworks like Spark and Hadoop.
Apart from work, Tushar enjoys watching movies, playing badminton,
and is an avid reader. He has also authored a book on R and social media analytics.

About the Technical Reviewer
Jojo Moolayil is an Artificial Intelligence professional and published
author of the book: Smarter Decisions – The Intersection of IoT and
Decision Science. With over five years of industrial experience in A.I.,
Machine Learning, Decision Science, and IoT, he has worked with
industry leaders on high impact and critical projects across multiple
verticals. He is currently working with General Electric, the pioneer and
leader in Data Science for Industrial IoT, and lives in Bengaluru—the
Silicon Valley of India.
He was born and raised in Pune, India and graduated from University
of Pune with a major in Information Technology Engineering. He started
his career with Mu Sigma Inc., the world’s largest pure play analytics
provider and then Flutura, an IoT Analytics startup. He has also worked
with the leaders of many Fortune 50 clients.
In his present role with General Electric, he focuses on solving A.I.
and decision science problems for Industrial IoT use cases and developing
Data Science products and platforms for Industrial IoT.
Apart from authoring books on decision science and IoT, Jojo has also been technical reviewer for
various books on Machine Learning and Business Analytics with Apress. He is an active Data Science tutor and maintains a blog at http://www.jojomoolayil.com/web/blog/.
You can reach out to Jojo at:
I would like to thank my family, 
friends, and mentors for their kind support and constant motivation throughout my life.

Foreword
The availability of affordable compute power enabled by Moore’s law has been enabling rapid advances in Machine Learning solutions and driving adoption across diverse segments of the industry. The ability to learn complex models underlying the real-world processes from observed (training) data through systemic, easy-to-apply Machine Learning solution stacks has been of tremendous attraction to businesses to harness meaningful business value. The appeal and opportunities of Machine Learning have resulted in the availability of many resources—books, tutorials, online training, and courses for solution developers, analysts, engineers, and scientists to learn the algorithms and implement platforms and methodologies. It is not uncommon for someone just starting out to get overwhelmed by the abundance of the material. In addition, not following a structured workflow might not yield consistent and relevant results with Machine Learning solutions.
Key requirements for building robust Machine Learning applications and getting consistent, actionable
results involve investing significant time and effort in understanding the objectives and key value of
the project, establishing robust data pipelines, analyzing and visualizing data, and feature engineering,
selection, and modeling. 
The iterative nature of these projects involves several Select → Apply → Validate → Tune cycles before coming up with a suitable Machine Learning-based model. A final and important
step is to integrate the solution (Machine Learning model) into existing (or new) organization systems
or business processes to sustain actionable and relevant results. Hence, the broad requirements of the
ingredients for a robust Machine Learning solution require a development platform that is suited not just for interactive modeling of Machine Learning, but also excels in data ingestion, processing, visualization, systems integration, and strong ecosystem support for runtime deployment and maintenance. Python is an excellent choice of language because it fits the need of the hour with its multi-purpose capabilities, ease of implementation and integration, active developer community, and ever-growing Machine Learning ecosystem, leading to its adoption for Machine Learning growing rapidly. The authors of this book have leveraged their hands-on experience with solving real-world problems using Python and its Machine Learning ecosystem to help the readers gain the solid knowledge needed to apply essential concepts, methodologies, tools, and techniques for solving their own real-world problems and use-cases. Practical Machine Learning with Python aims to cater to readers with varying skill levels ranging from beginners to experts and enable them in structuring and building practical Machine Learning solutions.
—Ram R. Varra, Senior Principal Engineer, Intel


Table of Contents
About the Authors ..................................................................................................xvii
About the Technical Reviewer ................................................................................xix
Acknowledgments ..................................................................................................xxi
Foreword ..............................................................................................................xxiii
Introduction ...........................................................................................................xxv
■■Part I: Understanding Machine Learning ............................................ 1
■■Chapter 1: Machine Learning Basics ..................................................................... 3
The Need for Machine Learning ....................................................................................... 4
Making Data-Driven Decisions ...............................................................................................................4
Efficiency and Scale ...............................................................................................................................5
Traditional Programming Paradigm ........................................................................................................5
Why Machine Learning? .........................................................................................................................6
Understanding Machine Learning .................................................................................... 8
Why Make Machines Learn? .................................................................................................................. 8
Formal Definition ....................................................................................................................................9
A Multi-Disciplinary Field .....................................................................................................................13
Computer Science .......................................................................................................... 14
Theoretical Computer Science ............................................................................................................. 15
Practical Computer Science .................................................................................................................15
Important Concepts ..............................................................................................................................15
Data Science .................................................................................................................. 16
Mathematics .................................................................................................................. 18
Important Concepts ..............................................................................................................................19
Statistics ........................................................................................................................ 24
Data Mining .................................................................................................................... 25
Artificial Intelligence ...................................................................................................... 25
Natural Language Processing ........................................................................................ 26
Deep Learning ................................................................................................................ 28
Important Concepts ..............................................................................................................................31
Machine Learning Methods ............................................................................................ 34
Supervised Learning ...................................................................................................... 35
Classification ........................................................................................................................................36
Regression ............................................................................................................................................37
Unsupervised Learning .................................................................................................. 38
Clustering .............................................................................................................................................39
Dimensionality Reduction .....................................................................................................................40
Anomaly Detection ............................................................................................................................... 41
Association Rule-Mining .......................................................................................................................41
Semi-Supervised Learning ............................................................................................. 42
Reinforcement Learning ................................................................................................. 42
Batch Learning ............................................................................................................... 43
Online Learning .............................................................................................................. 44
Instance Based Learning ................................................................................................ 44
Model Based Learning .................................................................................................... 45
The CRISP-DM Process Model ........................................................................................ 45
Business Understanding .......................................................................................................................46
Data Understanding ..............................................................................................................................48
Data Preparation ...................................................................................................................................50
Modeling ...............................................................................................................................................51
Evaluation .............................................................................................................................................52
Deployment ..........................................................................................................................................52
Building Machine Intelligence ........................................................................................ 52
Machine Learning Pipelines .................................................................................................................52
Supervised Machine Learning Pipeline ................................................................................................54
Unsupervised Machine Learning Pipeline ............................................................................................55
Real-World Case Study: Predicting Student Grant Recommendations ........................... 55
Objective ...............................................................................................................................................56
Data Retrieval .......................................................................................................................................56
Data Preparation ...................................................................................................................................57
Modeling ...............................................................................................................................................60
Model Evaluation ..................................................................................................................................61
Model Deployment ................................................................................................................................61
Prediction in Action ...............................................................................................................................62
Challenges in Machine Learning .................................................................................... 64
Real-World Applications of Machine Learning ............................................................... 64
Summary ........................................................................................................................ 65
■■Chapter 2: The Python Machine Learning Ecosystem ......................................... 67
Python: An Introduction .................................................................................................. 67
Strengths ..............................................................................................................................................68
Pitfalls ...................................................................................................................................................68
Setting Up a Python Environment .........................................................................................................69
Why Python for Data Science? .............................................................................................................71
Introducing the Python Machine Learning Ecosystem ................................................... 72
Jupyter Notebooks ................................................................................................................................72
NumPy ..................................................................................................................................................75
Pandas ..................................................................................................................................................84
Scikit-learn ...........................................................................................................................................96
Neural Networks and Deep Learning ..................................................................................................102
Text Analytics and Natural Language Processing ............................................................................... 112
Statsmodels ........................................................................................................................................116
Summary ...................................................................................................................... 118
■■Part II: The Machine Learning Pipeline ........................................... 119
■■Chapter 3: Processing, Wrangling, and Visualizing Data ................................... 121
Data Collection ............................................................................................................. 122
CSV .....................................................................................................................................................122
JSON ...................................................................................................................................................124
XML .....................................................................................................................................................128
HTML and Scraping ............................................................................................................................131
SQL .....................................................................................................................................................136
Data Description ........................................................................................................... 137
Numeric ..............................................................................................................................................137
Text .....................................................................................................................................................137
Categorical .........................................................................................................................................137
Data Wrangling ............................................................................................................. 138
Understanding Data ............................................................................................................................138
Filtering Data ......................................................................................................................................141
Typecasting .........................................................................................................................................144
Transformations ..................................................................................................................................144
Imputing Missing Values .....................................................................................................................145
Handling Duplicates ............................................................................................................................147
Handling Categorical Data ..................................................................................................................147
Normalizing Values .............................................................................................................................148
String Manipulations ..........................................................................................................................149
Data Summarization ..................................................................................................... 149
Data Visualization ......................................................................................................... 151
Visualizing with Pandas ......................................................................................................................152
Visualizing with Matplotlib ................................................................................................................. 161
Python Visualization Ecosystem .........................................................................................................176
Summary ...................................................................................................................... 176
■■Chapter 4: Feature Engineering and Selection .................................................. 177
Features: Understand Your Data Better ........................................................................ 178
Data and Datasets ..............................................................................................................................178
Features ..............................................................................................................................................179
Models ................................................................................................................................................179
Revisiting the Machine Learning Pipeline .................................................................... 179
Feature Extraction and Engineering ............................................................................. 181
What Is Feature Engineering? ............................................................................................................ 181
Why Feature Engineering? ..................................................................................................................183
How Do You Engineer Features? .........................................................................................................184
Feature Engineering on Numeric Data ......................................................................... 185
Raw Measures ....................................................................................................................................185
Binarization .........................................................................................................................................187
Rounding ............................................................................................................................................188
Interactions .........................................................................................................................................189
Binning ...............................................................................................................................................191
Statistical Transformations .................................................................................................................197
Feature Engineering on Categorical Data ..................................................................... 200
Transforming Nominal Features .........................................................................................................201
Transforming Ordinal Features ...........................................................................................................202
Encoding Categorical Features ...........................................................................................................203
Feature Engineering on Text Data ................................................................................ 209
Text Pre-Processing ............................................................................................................................210
Bag of Words Model ............................................................................................................................211
Bag of N-Grams Model .......................................................................................................................212
TF-IDF Model ......................................................................................................................................213
Document Similarity ...........................................................................................................................214
Topic Models .......................................................................................................................................216
Word Embeddings ...............................................................................................................................217
Feature Engineering on Temporal Data ........................................................................ 220
Date-Based Features ..........................................................................................................................221
Time-Based Features .........................................................................................................................222
Feature Engineering on Image Data ............................................................................. 224
Image Metadata Features ...................................................................................................................225
Raw Image and Channel Pixels ..........................................................................................................225
Grayscale Image Pixels .......................................................................................................................227
Binning Image Intensity Distribution ..................................................................................................227
Image Aggregation Statistics ..............................................................................................................228
Edge Detection ...................................................................................................................................229
Object Detection .................................................................................................................................230
Localized Feature Extraction ..............................................................................................................231
Visual Bag of Words Model .................................................................................................................233
Automated Feature Engineering with Deep Learning ......................................................................... 236
Feature Scaling ............................................................................................................ 239
Standardized Scaling ..........................................................................................................................240
Min-Max Scaling .................................................................................................................................240
Robust Scaling ....................................................................................................................................241
Feature Selection ......................................................................................................... 242
Threshold-Based Methods ..................................................................................................................243
Statistical Methods .............................................................................................................................244
Recursive Feature Elimination ............................................................................................................247
Model-Based Selection .......................................................................................................................248
Dimensionality Reduction ............................................................................................. 249
Feature Extraction with Principal Component Analysis ...................................................................... 250
Summary ...................................................................................................................... 252
■■Chapter 5: Building, Tuning, and Deploying Models .......................................... 255
Building Models ............................................................................................................ 256
Model Types ........................................................................................................................................257
Learning a Model ................................................................................................................................260
Model Building Examples ...................................................................................................................263
Model Evaluation .......................................................................................................... 271
Evaluating Classification Models ........................................................................................................271
Evaluating Clustering Models .............................................................................................................278
Evaluating Regression Models ........................................................................................................... 281
Model Tuning ................................................................................................................ 282
Introduction to Hyperparameters ........................................................................................................283
The Bias-Variance Tradeoff .................................................................................................................284
Cross Validation ..................................................................................................................................288
Hyperparameter Tuning Strategies .....................................................................................................291
Model Interpretation ..................................................................................................... 295
Understanding Skater .........................................................................................................................297
Model Interpretation in Action ............................................................................................................298
Model Deployment ....................................................................................................... 302
Model Persistence ..............................................................................................................................302
Custom Development .........................................................................................................................303
In-House Model Deployment ..............................................................................................................303
Model Deployment as a Service .........................................................................................................304
Summary ...................................................................................................................... 304
■■Part III: Real-World Case Studies ................................................... 305
■■Chapter 6: Analyzing Bike Sharing Trends ........................................................ 307
The Bike Sharing Dataset ............................................................................................. 307
Problem Statement ...................................................................................................... 308
Exploratory Data Analysis ............................................................................................. 308
Preprocessing .....................................................................................................................................308
Distribution and Trends .......................................................................................................................310
Outliers ...............................................................................................................................................312
Correlations ........................................................................................................................................314
Regression Analysis ..................................................................................................... 315
Types of Regression ...........................................................................................................................315
Assumptions .......................................................................................................................................316
Evaluation Criteria ..............................................................................................................................316
Modeling ...................................................................................................................... 317
Linear Regression ...............................................................................................................................319
Decision Tree Based Regression .........................................................................................................323
Next Steps .................................................................................................................... 330
Summary ...................................................................................................................... 330
■■Chapter 7: Analyzing Movie Reviews Sentiment ............................................... 331
Problem Statement ...................................................................................................... 332
Setting Up Dependencies ............................................................................................. 332
Getting the Data ........................................................................................................... 333
Text Pre-Processing and Normalization ....................................................................... 333
Unsupervised Lexicon-Based Models .......................................................................... 336
Bing Liu’s Lexicon ...............................................................................................................................337
MPQA Subjectivity Lexicon .................................................................................................................337
Pattern Lexicon ...................................................................................................................................338
AFINN Lexicon ....................................................................................................................................338
SentiWordNet Lexicon ........................................................................................................................340
VADER Lexicon ....................................................................................................................................342
Classifying Sentiment with Supervised Learning ......................................................... 345
Traditional Supervised Machine Learning Models ....................................................... 346
Newer Supervised Deep Learning Models ................................................................... 349
Advanced Supervised Deep Learning Models .............................................................. 355
Analyzing Sentiment Causation .................................................................................... 363
Interpreting Predictive Models ...........................................................................................................363
Analyzing Topic Models ......................................................................................................................368
Summary ...................................................................................................................... 372
■■Chapter 8: Customer Segmentation and Effective Cross Selling ....................... 373
Online Retail Transactions Dataset ............................................................................... 374
Exploratory Data Analysis ............................................................................................. 374
Customer Segmentation ............................................................................................... 378
Objectives ...........................................................................................................................................378
Strategies ...........................................................................................................................................379
Clustering Strategy .............................................................................................................................380
Cross Selling ................................................................................................................ 392
Market Basket Analysis with Association Rule-Mining ....................................................................... 393
Association Rule-Mining Basics .........................................................................................................394
Association Rule-Mining in Action ......................................................................................................396
Summary ...................................................................................................................... 405
■■Chapter 9: Analyzing Wine Types and Quality ................................................... 407
Problem Statement ...................................................................................................... 407
Setting Up Dependencies ............................................................................................. 408
Getting the Data ........................................................................................................... 408
Exploratory Data Analysis ............................................................................................. 409
Process and Merge Datasets ..............................................................................................................409
Understanding Dataset Features ........................................................................................................410
Descriptive Statistics ..........................................................................................................................413
Inferential Statistics ............................................................................................................................414
Univariate Analysis .............................................................................................................................416
Multivariate Analysis ..........................................................................................................................419
Predictive Modeling ...................................................................................................... 426
Predicting Wine Types .................................................................................................. 427
Predicting Wine Quality ................................................................................................ 433
Summary ...................................................................................................................... 446
■■Chapter 10: Analyzing Music Trends and Recommendations ........................... 447
The Million Song Dataset Taste Profile ......................................................................... 448
Exploratory Data Analysis ............................................................................................. 448
Loading and Trimming Data ................................................................................................................448
Enhancing the Data ............................................................................................................................451
Visual Analysis ....................................................................................................................................452
Recommendation Engines ............................................................................................ 456
Types of Recommendation Engines ....................................................................................................457
Utility of Recommendation Engines ....................................................................................................457
Popularity-Based Recommendation Engine ....................................................................................... 458
Item Similarity Based Recommendation Engine .................................................................................459
Matrix Factorization Based Recommendation Engine ........................................................................ 461
A Note on Recommendation Engine Libraries .............................................................. 466
Summary ...................................................................................................................... 466
■■Chapter 11: Forecasting Stock and Commodity Prices ..................................... 467
Time Series Data and Analysis ..................................................................................... 467
Time Series Components ....................................................................................................................469
Smoothing Techniques .......................................................................................................................471
Forecasting Gold Price ................................................................................................. 474
Problem Statement .............................................................................................................................474
Dataset ...............................................................................................................................................474
Traditional Approaches .......................................................................................................................474
Modeling .............................................................................................................................................476
Stock Price Prediction .................................................................................................. 483
Problem Statement .............................................................................................................................484
Dataset ...............................................................................................................................................484
Recurrent Neural Networks: LSTM .....................................................................................................485
Upcoming Techniques: Prophet ..........................................................................................................495
Summary ...................................................................................................................... 497
■■Chapter 12: Deep Learning for Computer Vision ............................................... 499
Convolutional Neural Networks .................................................................................... 499
Image Classification with CNNs ................................................................................... 501
Problem Statement .............................................................................................................................501
Dataset ...............................................................................................................................................501
CNN Based Deep Learning Classifier from Scratch ............................................................................ 502
CNN Based Deep Learning Classifier with Pretrained Models ............................................................ 505
Artistic Style Transfer with CNNs ................................................................................. 509
Background ........................................................................................................................................510
Preprocessing .....................................................................................................................................511
Loss Functions ....................................................................................................................................513
Custom Optimizer ...............................................................................................................................515
Style Transfer in Action .......................................................................................................................516
Summary ...................................................................................................................... 520
Index ..................................................................................................................... 521


Bookscreen
e-books shop

Introduction
Data is the new oil and Machine Learning is a powerful concept and framework for making the best out of it. In this age of automation and intelligent systems, it is hardly a surprise that Machine Learning and Data Science are some of the top buzz words. The tremendous interest and renewed investments in the field of Data Science across industries, enterprises, and domains are clear indicators of its enormous potential. Intelligent systems and data-driven organizations are becoming a reality and the advancements in tools and techniques is only helping it expand further. With data being of paramount importance, there has never been a higher demand for Machine Learning and Data Science practitioners than there is now. Indeed, the world is facing a shortage of data scientists. It’s been coined “The sexiest job in the 21st Century” which makes it all the more worthwhile to try to build some valuable expertise in this domain.  Practical Machine Learning with Python is a problem solver’s guide to building real-world intelligent systems. It follows a comprehensive three-tiered approach packed with concepts, methodologies, hands-on examples, and code. This book helps its readers master the essential skills needed to recognize and solve complex problems with Machine Learning and Deep Learning by following a data-driven mindset. Using real-world case studies that leverage the popular Python Machine Learning ecosystem, this book is your perfect companion for learning the art and science of Machine Learning to become a successful practitioner.
The concepts, techniques, tools, frameworks, and methodologies used in this book will teach you how to think, design, build, and execute Machine Learning systems and projects successfully.
This book will get you started on the ways to leverage the Python Machine Learning ecosystem with its
diverse set of frameworks and libraries. The three-tiered approach of this book starts by focusing on building a strong foundation around the basics of Machine Learning and relevant tools and frameworks, the next part emphasizes the core processes around building Machine Learning pipelines, and the final part leverages this knowledge on solving some real-world case studies from diverse domains, including retail, transportation, movies, music, computer vision, art, and finance. We also cover a wide range of Machine Learning models, including regression, classification, forecasting, rule-mining, and clustering. This book also touches on cutting edge methodologies and research from the field of Deep Learning, including concepts like transfer learning and case studies relevant to computer vision, including image classification and neural style transfer. Each chapter consists of detailed concepts with complete hands-on examples, code, and detailed discussions. The main intent of this book is to give a wide range of readers—including IT professionals, analysts, developers, data scientists, engineers, and graduate students—a structured approach to gaining essential skills pertaining to Machine Learning and enough knowledge about leveraging state-of-the-art Machine Learning techniques and frameworks so that they can start solving their own real-world problems.
This book is application-focused, so it’s not a replacement for gaining deep conceptual and theoretical
knowledge about Machine Learning algorithms, methods, and their internal implementations. We strongly recommend you supplement the practical knowledge gained through this book with some standard books on data mining, statistical analysis, and theoretical aspects of Machine Learning algorithms and methods to gain deeper insights into the world of Machine Learning.
Previous Post Next Post