For the context and framing of this discussion, please see the prior thread:

* [Preamble to the analysis of the Pandas/Python data analysis framework](

**Python data types**

The Python language contains a whole range of standard types, including primitive value types (int, float, etc), lists, tuples, dictionaries (i.e. finite mappings), functions and objects. For tutorials and reference information, see:

* [Python project page](

**ndarray (NumPy)**

The python module NumPy has an n-dimensional array type. All the elements in an ndarray must be of the same Python type. This is an efficient representation, which gets packed into a contiguous array in memory. This makes it a good format for interfacing with libraries that are external to Python. NumPy provides operators that will apply element-wise operations to entire arrays (vectorization). So, even though the Python interpreter does have performance deficits in comparison with strongly typed compiled languages, by making use of vectorized operators on large data sets, the critical inner loops are being performed in the compiled NumPy library, rather than in the Python interpreter.

**Series and DataFrame (Pandas)**

These two data types (classes in the Pandas module) are built on top of the ndarray data type. They are enrichments of, respectively, the mathematical types Sequence and Relation. A Series is a sequence of values with associated labels, and a DataFrame is a two-dimensional, column-oriented structure with row and column labels.

**Index (Pandas)**

An Index is an object that provides the sequences of labels that are used in the Series and DataFrame objects. An Index may contain multiple levels of hierarchy within it.

This thread will consist of an exposition of the algebra of Series and DataFrames, along with examples of their use.