Incredibly good introduction to subversion

Best introduction to subversion →  Day-to-day with Subversion

Anuncios

What are orthogonal polynomial and how are used in linear regression models

Well, I am doing the Coursera StatsLearning course from Stanford and I didn´t understand the use of orthogonal polynomials in a linear regression model.

After much looking around on the web I have finally understood how all is connected.

In linear regression you try to find the coefficients \alpha_j that reduce the sum of squared erros from: y_i=\alpha_0+ \alpha_1 x_i+....+\alpha_j x_i^j where i spans to all the samples we have, and j spans the polynomial degree we are using to fit the data.

When we use orthogonal polynomial we use instead the following expression to fit the data:

y_i=\alpha_0+ \alpha_1 (a_{11} x_i+a_{10} )+\alpha_2 (a_{22} x_i^2+a_{21}x_i+a_{20})+....

where the polynomials: p_j(x) =a_{jj} x^j+...a_{j1}x+a_{j0} are orthogonal to each other. Meaning by orthogonal that:

\sum\limits_{k=1}^N p_i (x_k) p_j (x_k) = 0; i\neq j; where N is the number of samples.

So, in the above sum the coefficients of the polynomials are chosen to make this sum equal to zero, and this is the polynomial provided by R using the poly function inside and lm expression.

I give a reference to the links I have used to clarify the topic:

http://www.statsci.org/smyth/pubs/EoB/bap064-.pdf

http://math.stackexchange.com/questions/279608/how-to-work-out-orthogonal-polynomials-for-regression-model

http://books.google.es/books?id=YJkY2LGLbI8C&printsec=frontcover&hl=es&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false

Retrospective sampling or case-control sampling

When the prior probabilities of the classes the we want to classify are very imbalanced the it is good to use the retrospective or case-control sampling.

For example, you can do a logistic regression with case-control sampling. You have to use around 4-6 times more controls than cases, and the to adjust the intercept \beta_0 of your model with an adjustment: https://class.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/classification.pdf  (page 16), also : http://support.sas.com/kb/22/601.html  for an explanation