# [Book Review] Python Data Analysis

“Python Data Analysis” provides us a complete Python package to manage, manipulate, and visualize data. Actually, data analysis is a complex area. However the author of the book, Ivan Idris, gives a clear explanation about how to implement any advanced algorithm into real world Python application.

The first three chapters of the book give us the basic of data analysis such as array, statistics, and linear algebra. The library used are Numpy, Scipy, and matplotlib for visualization. In the next chapter, we will get a clear description about Pandas project, one the most library for data analysis. Ivan wrote a step by step directions start from installing, querying, and basic data manipulation (statistics, aggregation, pivot, etc) using Pandas.

Before continouing to more advanced method, the book describes how to retrieve, process, and store data. Thanks to Python which has a complete library for connecting to various format e.g. CSV and Excel (Numpy and Pandas), JSON (json native package), RSS (feedparser), and HTML (BeautifulSoup). Data analysis using signal processing and time series representation can be found in Chapter 7. We will be introduced with statsmodel library that provides moving average technique, window function, dataset used for experiments. The autoregressive, ARMA, Fourier, and Spectral analysis are provide by Numpy and Scipy. All these methods are theoretically complex but can be simply implemented using Python.

The data are usually stored in database. The book clarify how to work with database and the use of supported package i.e. sqllite3, SQLAchemy, Pony ORM, PyMongo, and Redis. Every library is explained followed with example source code. Programmer always loves working code, not only pseudo code :). The emerging trends such as social media also discussed in this book. Data generated from social media can be analysed for opinion mining or sentiment analysis. It is for automating evaluation of opinion or expression in social media. The library used is Natural Language ToolKit (NLTK). In Chapter 10, we will get a detailed description about Scikit-Learn library and its example for machine learning for data analysis.

In conclusion, this book provides us a data analysis technique using Python comprehensively. I just can say “two thumbs up!” for “Python Data Analysis” book!

February 11, 2015