  • NumPy/Scipy You probably know about these already. But let me point out the Cookbook where you can read about many statistical facilities already available and the Example List which is a great reference for functions (including data manipulation and other operations). Another handy reference is John Cook’s Distributions in Scipy.
  • pandas This is a really nice library for working with statistical data — tabular data, time series, panel data. Includes many builtin functions for data summaries, grouping/aggregation, pivoting. Also has a statistics/econometrics library.
  • larry Labeled array that plays nice with NumPy. Provides statistical functions not present in NumPy and good for data manipulation.
  • python-statlib A fairly recent effort which combined a number of scattered statistics libraries. Useful for basic and descriptive statistics if you’re not using NumPy or pandas.
  • statsmodels Statistical modeling: Linear models, GLMs, among others.
  • scikits Statistical and scientific computing packages — notably smoothing, optimization and machine learning.
  • PyMC For your Bayesian/MCMC/hierarchical modeling needs. Highly recommended.
  • PyMix Mixture models.

If speed becomes a problem, consider Theano — used with good success by the deep learning people.

Excellent follow up courses on machine learning  … Carnegie Mellon, Tom Mitchell

.. Andrew Ng  CS 229 , it is diferent and more advanced than the coursera course by Ng

Machine Learning by Pedro Gomingos, University of Wasington