Even apart from HDF5, the issue of managing large DataFrames, that don't fit into memory, is challenging and important. And if you add in the prospect of writing to the DataFrame, it becomes even more challenging -- to do this all efficiently. My understanding is that K/KDB/Q is masterful at efficiently managing large arrays that cannot fit into memory. Interestingly, I see there is now an open-source variant of K itself, called Kona. Not sure how significant that is, but in any case, I hope that Pandas will one day catch up in the area of high-performance disk-based array processing.