Bias vs variance

This post is a great summary of the bias versus varianve dilemma and how to solve it

https://www.linkedin.com/pulse/6-ways-make-your-predictive-models-better-ahmed-el-deeb

Deep learning starting….

To start with deep learning:

http://deeplearning.net/

http://deeplearning.stanford.edu/

http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/

http://www.cs.toronto.edu/~hinton/

http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial

http://www.quora.com/Whats-the-most-effective-way-to-get-started-with-Deep-Learning

https://www.kaggle.com/c/facial-keypoints-detection/details/deep-learning-tutorial

http://www.kaggle.com/forums/f/15/kaggle-forum/t/9452/advice-on-learning-deep-learning-neural-networks

http://www.kdnuggets.com/2014/05/learn-deep-learning-courses-tutorials-overviews.html

Intuitive and practical….. PCA, SVD, dimensionality reduction and all that…..

Links to places than explain well how to do PCA and how to understand it…….

http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues

http://math.stackexchange.com/questions/1146/intuitive-way-to-understand-principal-component-analysis

http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

http://www.cerebralmastication.com/2010/09/principal-component-analysis-pca-vs-ordinary-least-squares-ols-a-visual-explination/

Using Python for data analysis, machine learning

Here is an excellent list of tools from python that you can use for your machine learning projects:

http://stats.stackexchange.com/questions/1595/python-as-a-statistics-workbench

extracted from this reference:

  • NumPy/Scipy You probably know about these already. But let me point out the Cookbook where you can read about many statistical facilities already available and the Example List which is a great reference for functions (including data manipulation and other operations). Another handy reference is John Cook’s Distributions in Scipy.
  • pandas This is a really nice library for working with statistical data — tabular data, time series, panel data. Includes many builtin functions for data summaries, grouping/aggregation, pivoting. Also has a statistics/econometrics library.
  • larry Labeled array that plays nice with NumPy. Provides statistical functions not present in NumPy and good for data manipulation.
  • python-statlib A fairly recent effort which combined a number of scattered statistics libraries. Useful for basic and descriptive statistics if you’re not using NumPy or pandas.
  • statsmodels Statistical modeling: Linear models, GLMs, among others.
  • scikits Statistical and scientific computing packages — notably smoothing, optimization and machine learning.
  • PyMC For your Bayesian/MCMC/hierarchical modeling needs. Highly recommended.
  • PyMix Mixture models.

If speed becomes a problem, consider Theano — used with good success by the deep learning people.

Information for the tools:

For Pandas:

http://pandas.pydata.org/pandas-docs/dev/10min.html

for a short summary on pandas:

http://www.bigdataexaminer.com/exploratory-data-analysis-in-python-using-pandas-matplotlib-and-numpy/

For Numpy/Scipy:

http://wiki.scipy.org/Cookbook

For GLM:

http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/glm.html

Monte Carlo

http://pymc-devs.github.io/pymc/tutorial.html#