Great scikit-learn Tutorial
and great blog in general on python and bayesian:
This blog gives very good hints about comparing Rstudio and spyder (Anaconda IDE for python):
This information came from a post of Santiago Egea (Universidad de Valladolid)
With IPython Notebook + nbviewer you got a similar functionality to using Rmarkdown + RPubs
You create your markdown documents and then you can access them using nbviewer. Actually in nbviewer you don´t leave any doc, the we justa access your docs which are usually in github
Here is an excellent list of tools from python that you can use for your machine learning projects:
extracted from this reference:
- NumPy/Scipy You probably know about these already. But let me point out the Cookbook where you can read about many statistical facilities already available and the Example List which is a great reference for functions (including data manipulation and other operations). Another handy reference is John Cook’s Distributions in Scipy.
- pandas This is a really nice library for working with statistical data — tabular data, time series, panel data. Includes many builtin functions for data summaries, grouping/aggregation, pivoting. Also has a statistics/econometrics library.
- larry Labeled array that plays nice with NumPy. Provides statistical functions not present in NumPy and good for data manipulation.
- python-statlib A fairly recent effort which combined a number of scattered statistics libraries. Useful for basic and descriptive statistics if you’re not using NumPy or pandas.
- statsmodels Statistical modeling: Linear models, GLMs, among others.
- scikits Statistical and scientific computing packages — notably smoothing, optimization and machine learning.
- PyMC For your Bayesian/MCMC/hierarchical modeling needs. Highly recommended.
- PyMix Mixture models.
If speed becomes a problem, consider Theano — used with good success by the deep learning people.
Information for the tools:
for a short summary on pandas: