Hello Azimuth friends,

I've been having a great time learning the Pandas framework, which is embedded in the Python language, and is key to the "scientific python ecosystem." I'm starting this thread, and another one, to share some of these ideas. I'm also hoping to generate some raw material here for a blog article.

Here is a classic reference book:

* [Python for Data Analysis](https://books.google.com/books/about/Python_for_Data_Analysis.html?id=v3n4_AK8vu0C&printsec=frontcover&source=kp_read_button#v=onepage&q&f=false), Wes McKinney, O'Reilly Media, 2013.

Here is a recommended primer from the Pandas website:

* [10 Minutes to Pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html).

Here are the main components of the scientific python ecosystem. I am paraphrasing/quoting from McKinney:

* NumPy. Short for numerical python, NumPy is the foundational package for scientific computing in Python. It provides a fast and efficient multi-dimensional array object; functions for performing element-wise computations with arrays or mathematical operations between arrays; tools for reading and writing array-based data sets to disk; linear algebra operations, Fourier transform, and random number generation; tools for integrating other languages with Python.

* pandas. Pandas provides rich data structures and functions designed to make working with structured data fast, easy and expressive. The primary object in pandas is the DataFrame, a two-dimensional tabular, column-oriented structure with both row and column labels. Pandas combines the high performance array-computing features of NumPy with the flexible data manipulation capabilities of spreadsheets and relational databases.

And, I may add: it is seamlessly integrated with the developed high-level language Python, which contains mechanisms for abstraction, functional programming, object-orientation; extensive platform support libraries for systems programming, web services interfaces, etc., etc.

For users of the R statistical computing language, the DataFrame name will be familiar, as it was named after the similar R data.frame object. They are not the same however, as the functionality provided by the R data frame is essentially a strict subset of that provided by the pandas DataFrame.

* matplotlib. The most popular Python library for producing plots and other 2D visualizations. It is maintained by a large team of developers, and is well-suited for creating publication-quality plots.

* IPython. IPython is the component in the toolset that ties everything together; it provides a robust and productive environment for interactive and exploratory computing.

* SciPy. SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. It includes: scipy.integrate, with numerical integration routines and differential equation solvers; scipy.linalg, with linear algebra and matrix decompostion algorithms; scipy.optimize, with function optimizers and root finding algorithms; scipy.signal, with signal processing tools; scipy.sparse, for sparse matricies and sparse linear system solvers; scipy.stats, with standard continuous and discrete probability distributions, statistical tests, and descriptive statistics; scipy.weave, a tool for using inline C++ code to accelerate array computations.

Together NumPy and SciPy form a reasonably complete computational replacement for much of MATLAB along with some of its add-on toolboxes.

And, I may add: it is free!

I've been having a great time learning the Pandas framework, which is embedded in the Python language, and is key to the "scientific python ecosystem." I'm starting this thread, and another one, to share some of these ideas. I'm also hoping to generate some raw material here for a blog article.

Here is a classic reference book:

* [Python for Data Analysis](https://books.google.com/books/about/Python_for_Data_Analysis.html?id=v3n4_AK8vu0C&printsec=frontcover&source=kp_read_button#v=onepage&q&f=false), Wes McKinney, O'Reilly Media, 2013.

Here is a recommended primer from the Pandas website:

* [10 Minutes to Pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html).

Here are the main components of the scientific python ecosystem. I am paraphrasing/quoting from McKinney:

* NumPy. Short for numerical python, NumPy is the foundational package for scientific computing in Python. It provides a fast and efficient multi-dimensional array object; functions for performing element-wise computations with arrays or mathematical operations between arrays; tools for reading and writing array-based data sets to disk; linear algebra operations, Fourier transform, and random number generation; tools for integrating other languages with Python.

* pandas. Pandas provides rich data structures and functions designed to make working with structured data fast, easy and expressive. The primary object in pandas is the DataFrame, a two-dimensional tabular, column-oriented structure with both row and column labels. Pandas combines the high performance array-computing features of NumPy with the flexible data manipulation capabilities of spreadsheets and relational databases.

And, I may add: it is seamlessly integrated with the developed high-level language Python, which contains mechanisms for abstraction, functional programming, object-orientation; extensive platform support libraries for systems programming, web services interfaces, etc., etc.

For users of the R statistical computing language, the DataFrame name will be familiar, as it was named after the similar R data.frame object. They are not the same however, as the functionality provided by the R data frame is essentially a strict subset of that provided by the pandas DataFrame.

* matplotlib. The most popular Python library for producing plots and other 2D visualizations. It is maintained by a large team of developers, and is well-suited for creating publication-quality plots.

* IPython. IPython is the component in the toolset that ties everything together; it provides a robust and productive environment for interactive and exploratory computing.

* SciPy. SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. It includes: scipy.integrate, with numerical integration routines and differential equation solvers; scipy.linalg, with linear algebra and matrix decompostion algorithms; scipy.optimize, with function optimizers and root finding algorithms; scipy.signal, with signal processing tools; scipy.sparse, for sparse matricies and sparse linear system solvers; scipy.stats, with standard continuous and discrete probability distributions, statistical tests, and descriptive statistics; scipy.weave, a tool for using inline C++ code to accelerate array computations.

Together NumPy and SciPy form a reasonably complete computational replacement for much of MATLAB along with some of its add-on toolboxes.

And, I may add: it is free!