mm-2379 === Subject: Re: What kind of correlation is this one??? no no no you got it all wrong. x=y only if c=d. then you get the binary hyperpolie of the gradient x and product that with an elephant, dummy! === Subject: Re: multiple regression (intercept) Consider the meaning of the intercept. It is the expected value of Y when the X's are equal to zero. So, when var1, var2, and var3 (whatever they might be) are all equal to zero, then the expected value for Y assuming linearity is -2000. The intercept is not often meaningful, particularly when it is far outside of the region of the observed values. Nonetheless it is necessary to describe the fit and make proper predictions within the range of the observed data. If you must have some meaningful interpretation of the intercept, center your predictor variables by subtracting the mean of the predictor from each of its values. The variables are now expressed as deviations about the mean. Now, when var1, var2, and var3 equal zero, they are at their means. This does not change the other properties of the model. But, the intercept might have a useful interpretation as the expected value of Y at the mean of all the X's. Brett > Hi illywhacker, [What do you mean the intercept does not 'mean anything'?] > well, because a negative intercept doesn't mean nothing in my study : it's > mean that company should pay every new consumer (if var1, var2 and var3 =0). > No sense! [Do you mean that you know in advance that the intercept is zero?] > No, intercept = -2000 (here is my equation : Y = -2000 + 544*var1 + > 1166*var2 - 487*var3. [Either way, if you know these things, you must impose the constraint] > what do you mean ? I don't understand. > === Subject: Re: multiple regression (intercept) listed above. > thierry, Consider the meaning of the intercept. It is the expected value of Y > when the X's are equal to zero. So, when var1, var2, and var3 (whatever > they might be) are all equal to zero, then the expected value for Y > assuming linearity is -2000. The intercept is not often meaningful, particularly when it is far > outside of the region of the observed values. Nonetheless it is > necessary to describe the fit and make proper predictions within the > range of the observed data. If you must have some meaningful interpretation of the intercept, center > your predictor variables by subtracting the mean of the predictor from > each of its values. The variables are now expressed as deviations about > the mean. Now, when var1, var2, and var3 equal zero, they are at their > means. This does not change the other properties of the model. But, > the intercept might have a useful interpretation as the expected value > of Y at the mean of all the X's. Brett > Nicely summarized, Brett. I'll just add that the variables do not have to be centred on the mean. You can centre them on any (sensible) value you wish (e.g., the minimum or maximum, quartiles, etc). Obviously, it is important to be clear about what you did when reporting the results. -- Bruce Weaver bweaver@lakeheadu.ca www.angelfire.com/wv/bwhomedir === Subject: Re: multiple regression (intercept) <42d61cdc$0$5050$636a15ce@news.free.fr> well, because a negative intercept doesn't mean nothing in my study : it's > mean that company should pay every new consumer (if var1, var2 and var3 =0). > No sense! value in sci.stat.consult: ... the regression equation tells you what *has happened* on average, and why would you not expect a company to have paid for new customers? If cost includes overhead, advertising, salesmen's salaries, etc., getting a new customer may cost money. This is offset by the large positive coefficient of duration (keeping the customer?). Granted it might be better to be in the black from the start, but that's a question of business model, I suppose. As a data analyst one needs to understand where the terms in the equation come from and what they mean in a larger context to determine whether or not the results make sense. While at first glance it may not make sense to have to pay for a new customer, in fact it may. That is one of the difficulties of answering questions in this forum. We do not know much of the underlying information, like what constitutes cost. snip Russell === Subject: Re: multiple regression (intercept) Hi Anon. Anon. a ,crit dans le message de >> kAiBe.9900$qg1.786183@news20.bellglobal.com... >I ask this question because I have a negative intercept in my equation (Y >= -2000 + 544*var1 + 1166*var2 - 487*var3), in my case intercept doesn't >mean anything. >> Either way, if you know these things, you must impose the constraint, >> otherwise you will end up with nonsense. > I think you're being excessive here: you can get nonsense if you impose a > constraint too. One can fit a model without an intercept, but this is usually advised > against. The model assumes that the response varies linearly with the > covariates. This is probably not precisely true, but may well be good > enough over the range of the data. (as an aside: this should be checked, > e.g. by plotting the residuals against the covariates) If there is > non-linearity outside the range of the data, it won't be picked up in the > analysis. If you force the fit through the intercept, you can get a very > misleading model, that is wrong everywhere, rather than just at one point. Well, the model with intercept is not just wrong at one point, but over a range of values where the behaviour is nonlinear. In this case, it seems that the OP is rather concerned with the region near the origin where the behaviour may be nonlinear, in which case the linear model with intercept is not of much use. Neither of course will the linear model without intercept be of much use if it does not correspond well with the behaviour. The solution is to use a better model, or several models, and to make model choices. illywhacker; === Subject: Re: multiple regression (intercept) > Hi Anon. Anon. a ,crit dans le message de kAiBe.9900$qg1.786183@news20.bellglobal.com... >I ask this question because I have a negative intercept in my equation (Y >>= -2000 + 544*var1 + 1166*var2 - 487*var3), in my case intercept doesn't >>mean anything. >Either way, if you know these things, you must impose the constraint, >otherwise you will end up with nonsense. >I think you're being excessive here: you can get nonsense if you impose a >>constraint too. >>One can fit a model without an intercept, but this is usually advised >>against. The model assumes that the response varies linearly with the >>covariates. This is probably not precisely true, but may well be good >>enough over the range of the data. (as an aside: this should be checked, >>e.g. by plotting the residuals against the covariates) If there is >>non-linearity outside the range of the data, it won't be picked up in the >>analysis. If you force the fit through the intercept, you can get a very >>misleading model, that is wrong everywhere, rather than just at one point. > Well, the model with intercept is not just wrong at one point, but over a > range of values where the behaviour is nonlinear. In this case, it seems > that the OP is rather concerned with the region near the origin where the > behaviour may be nonlinear, in which case the linear model with intercept is > not of much use. Neither of course will the linear model without intercept > be of much use if it does not correspond well with the behaviour. The > solution is to use a better model, or several models, and to make model > choices. > I interpreted the OP's question as one where he had noticed something that seemed silly, but there was nothing in his comments to suggest that his data was anywhere near this region of the parameter space. If it's nowhere near, and the linear model seems to fit OK, then I would suggest not worrying about it. OTOH, if the linear model doesn't fit OK, or if the OP is intending to use the model near to the origin, then yes, he should improve the model. One problem is that if there's no data near the origin, then it's difficult to see how to select a better model: there's no information in the data in that part of the parameter space. Bob -- Bob O'Hara Dept. of Mathematics and Statistics P.O. Box 68 (Gustaf H.84llstr.94min katu 2b) FIN-00014 University of Helsinki Finland Telephone: +358-9-191 51479 Mobile: +358 50 599 0540 Fax: +358-9-191 51400 WWW: http://www.RNI.Helsinki.FI/~boh/ Journal of Negative Results - EEB: http://www.jnr-eeb.org === Subject: Re: multiple regression (intercept) Hi Anon. Anon. a ,crit dans le message de > I interpreted the OP's question as one where he had noticed something that > seemed silly, but there was nothing in his comments to suggest that his > data was anywhere near this region of the parameter space. If it's > nowhere near, and the linear model seems to fit OK, then I would suggest > not worrying about it. OTOH, if the linear model doesn't fit OK, or if > the OP is intending to use the model near to the origin, then yes, he > should improve the model. One problem is that if there's no data near the > origin, then it's difficult to see how to select a better model: there's > no information in the data in that part of the parameter space. You may be right that he is not concerned with the region near the origin of the data space. If so, not much more to say. Still, one could construct a more sophisticated model even if the current data lies away from the origin were there enough domain knowledge. But that is something only the OP can provide. As always, the inference is trivial in principle: modelling is key. illywhacker; === Subject: Re: Multivariate permutations? > Hi everyone, >> I am having a bit of a problem figuring something out, and I was >> hoping I could get some help. I am an electronic engineer, so I have a >> little bit of maths background. >> Here is the problem: >> I have a set of signals to be mapped onto a set of transmission lines. >> I know how many transmission lines there are (n) and how many signals >> I need to map (k). Furthermore, I know that (p) transmission lines are >> faulty and should not be used to map the signals. >> Assuming a completely random process for both signal mapping and fault >> occurrence, how do I find the probability that all my signals will be >> mapped onto working transmission lines? >> I hope the problem is clear. I can do this by hand for a small n,k,p, >> and I have been trying to figure out a general formula, but no success >> so far! > Ignoring my previous attempt which was flawed, the correct answer is the following Connect the signals one by one. Prob #1 is OK is 1-p/n Given this, the faulty lines are *all* in the remaining n-1 unused connections. Conditional Prob #2 is OK is 1-p/(n-1) Given this, the faulty lines are all in the remaining n-2 unused connections. Conditional Prob #3 is OK is 1-p/(n-2) So the probability that all signals are good is (1-p/n)*(1-p/(n-1))*(1-p/(n-2))*...*(1-p/(n-k+1)) I think classical statisticians would call this an urn problem without replacement. rusty === Subject: Re: Kurtosis approximations The reason this one is useful is that you can do it incrementally. Take x_1, calculate E(x^4), E(x^3), E(x^2), E(x). Take x_2, update your calculations (I'll spell it out below.), repeat through your whole data set. This should all be computationally well-behaved (and no subtractions). As the last step, do the addition and subtraction above. If your distribution is at all well-behaved (and centered near zero), then E(x^4) should never get too huge any step along the way. Yeah, I guess I'm solving a different problem here---this is great for when you don't know $bar{x}$, because the data is coming to you incrementally. The creative reader can surely come up with other uses for this form. But to get back to the original question, if you know $bar{x}$, you can do the incremental updating trick on the basic definition of kurtosis, and should never overflow. Here's some pseudocode: mu = 0 for (i = 1 to max_data){ mu = mu * (i-1)/i mu = mu + (x[i]-bar{x})^4/i } So at each step, you update the mean to include one more term. That means you're never summing thousands of fourth-power terms, so you'll never overflow. === Subject: Re: Kurtosis approximations <42d4e255$1@news.nwl.ac.uk> Hi David, Im trying to minimize the kurtosis for a sample obtained by linear programming. Some solves are designed to handle only linear equations for minimizing. However, if kurtosis needs to be minimized we need to have an approximated equation which just has first order powers(some solvers can use 2nd order too). The L-moments looks like a very nice -Vinay === Subject: Re: prior distributions of estimated parameters <42d298fb$0$28434$626a14ce@news.free.fr> <42d61ac9$0$32363$636a15ce@news.free.fr > As you can imagine, I do not agree with Reef Fish. He gives no reason why he > does not like the reference I gave you. Here are the reasons why I do like > it. This was what I said to the OP, RF> I would NOT recommend reading what illywhacker has written RF> about the subject in this thread. I did not even see what you referenced! It was what you had WRITTEN yourself, illywhacker that was not worth reading. You seemed to have given a VERY GOOD reason why yourself: Jaynes' background is in physics, and he thinks like a scientist, not a > statistician. My background too is in physics, and I appreciate his approach > because of this. Why should the OP learn about Bayesian STATISTICAL inference from statisticians when there are physicists to misrepresent the view? -- Bob. === Subject: Re: prior distributions of estimated parameters Reef Fish a .8ecrit dans le message de > This was what I said to the OP, RF> I would NOT recommend reading what illywhacker has written > RF> about the subject in this thread. I did not even see what you referenced! It was what you had WRITTEN > yourself, illywhacker that was not worth reading. You are right - I misread you. However, you have still given no reasons for why what I said what not worth reading, apart from the appeal to authority below, which I address separately. As you yourself are so fond of saying, let's talk about the statistical substance. > You seemed to > have given a VERY GOOD reason why yourself: >> Jaynes' background is in physics, and he thinks like a scientist, not a >> statistician. My background too is in physics, and I appreciate his >> approach >> because of this. Why should the OP learn about Bayesian STATISTICAL inference from > statisticians when there are physicists to misrepresent the view? 1) Because physicists, not statisticians, invented Bayesian inference. When it was invented there was no such thing as a statistician. 2) Because physicists, not statisticians, were responsible for the renaissance of Bayesian thinking, after decades of useless thinking on behalf of statisticians, and very much against their will. 3) Because statistics (or more correctly, inference) should be a tool for science, and yet statisticians often seem to have very little idea of how science actually works. Hence the disregard for statistics in the sciences that have too much history to need a statistical prop to feel important. illywhacker; === Subject: Re: prior distributions of estimated parameters <42d298fb$0$28434$626a14ce@news.free.fr> <42d61ac9$0$32363$636a15ce@news.free.fr> <42d6d8e1$0$5047$636a15ce@news.free.fr > Reef Fish a .8ecrit dans le message de > This was what I said to the OP, > RF> I would NOT recommend reading what illywhacker has written > RF> about the subject in this thread. > I did not even see what you referenced! It was what you had WRITTEN > yourself, illywhacker that was not worth reading. You are right - I misread you. However, you have still given no reasons for > why what I said what not worth reading, apart from the appeal to authority > below, which I address separately. As you yourself are so fond of saying, > let's talk about the statistical substance. > You seemed to > have given a VERY GOOD reason why yourself: >> Jaynes' background is in physics, and he thinks like a scientist, not a >> statistician. My background too is in physics, and I appreciate his >> approach >> because of this. > Why should the OP learn about Bayesian STATISTICAL inference from > statisticians when there are physicists to misrepresent the view? 1) Because physicists, not statisticians, invented Bayesian inference. When > it was invented there was no such thing as a statistician. That's very interesting. ANOTHER physicist's misrepresentation? Referend Thomas Bayes was a Prebyterian minister and a mathematician! http://en.wikipedia.org/wiki/Thomas Bayes 2) Because physicists, not statisticians, were responsible for the > renaissance of Bayesian thinking, after decades of useless thinking on > behalf of statisticians, and very much against their will. ANOTHER interesting physicist's MISinformation and MISrepresentation. Do we know any of these physicist by name because of they published in bringing about the renaissance of Bayesian thinking? Have you heard of Lindley, de Finetti, Jeffreys, and L.J. Savage? > 3) Because statistics (or more correctly, inference) should be a tool for > science, and yet statisticians often seem to have very little idea of how > science actually works. Hence the disregard for statistics in the sciences > that have too much history to need a statistical prop to feel important. illywhacker; A broad stroke as a spokesman for science after you have THOROUGHLY displayed your ignorance about Bayesian inference and statistics? Some of us statisticians might differ with your brash opinion. Statistics by George Box as a worthwhile reading for statisticians, some of whom may well be in your stereotypical group. But you are far too naive and ignorant to require any further DEBUNKING of the three reasons you so brashly put forth. -- Bob. === Subject: Re: r-Squared Question >>Rather it is usually defined >>as 1-ResSS/TSS (or RegSS/TSS), > No. But it's equivalent to the usual RegSS/TotSS because > RegSS + SSE (your ResSS) = TotSS. Isn't that what or means, as in 3/6 or 1/2? >If one uses the formal definition of R^2 >>to calculate it for this example, R^2 turns out to be -0.03, which says >>the problem is with the model, not R^2. > This is your ERROR, Jerry. The definition of Multiple R^2 CANNOT lead to a negative value! > I'm not sure what the issue is here. R^2 cannot lead to a negative value in the land of sanity and least squares. The poster was getting an R^2 of 1 for his ill-fitting model, not obtained by any least squares procedure, by calculating it as the square of the correlation between observed and predicted and thought it showed a weakness in R^2 as a summary measure. The problem was not with R^2, but with the poster's definition of it. One can calculate RegSS, ResSS, and TotalSS. The poster's model was worse (in terms of least squared errors) than no model at all, that is, ResSS was greater than TotalSS. If one blindly plugs these numbers into a formula for R^2 one gets -0.03. The point is that R^2 is not, in fact, deficient for suggesting the model is perfect. Rather, it is saying that something is very wrong with the model because it gives a negative value where such a thing should be impossible. One would hope that a measure of goodness-of-fit would go off the scale when assessing a model that(a) was derived under methods different from those the measure was designed to assess and (b) is worse than no model at all. === Subject: Re: r-Squared Question <2005Jul12.102045.6023@jarvis.cs.toronto.edu> >>Rather it is usually defined >>as 1-ResSS/TSS (or RegSS/TSS), > > No. But it's equivalent to the usual RegSS/TotSS because > RegSS + SSE (your ResSS) = TotSS. Isn't that what or means, as in 3/6 or 1/2? My no was referring to it is usually defined as. I probably never read the book from which you got your definition, because I've NEVER seen R^2 DEFINED as 1-ResSS/TSS). >>If one uses the formal definition of R^2 >>to calculate it for this example, R^2 turns out to be -0.03, which says >>the problem is with the model, not R^2. > > This is your ERROR, Jerry. > The definition of Multiple R^2 CANNOT lead to a negative value! I'm not sure what the issue is here. R^2 cannot lead to a negative > value in the land of sanity and least squares. Excuse me. Are we discussing statistics in Alice in Wonderland? The poster was getting an R^2 of 1 for his ill-fitting model, not > obtained by any least squares procedure, by calculating it as the square > of the correlation between observed and predicted and thought it showed > a weakness in R^2 as a summary measure. The problem was not with R^2, but with the poster's definition of it. Then why not tell it in Plain English that R^2 is a mathematical quantity that CANNOT possibly take on a negative value UNLESS someone is mangling it by introducing something improper! I mentioned the economist's use Adjusted R^2 as another example of Quackery. > One can calculate RegSS, ResSS, and TotalSS. The poster's model was > worse (in terms of least squared errors) than no model at all, that is, > ResSS was greater than TotalSS. If one blindly plugs these numbers into > a formula for R^2 one gets -0.03. The point is that R^2 is not, in > fact, deficient for suggesting the model is perfect. Rather, it is > saying that something is very wrong with the model because it gives a > negative value where such a thing should be impossible. One would hope > that a measure of goodness-of-fit would go off the scale when assessing > a model that(a) was derived under methods different from those the > measure was designed to assess and (b) is worse than no model at all. Your follow-up did not clarify or rectify the issue that whatever the OP did, it was statistical NONSENSE. -- Bob. === Subject: Re: r-Squared Question > >Rather it is usually defined >>as 1-ResSS/TSS (or RegSS/TSS), >No. But it's equivalent to the usual RegSS/TotSS because >RegSS + SSE (your ResSS) = TotSS. >>Isn't that what or means, as in 3/6 or 1/2? > My no was referring to it is usually defined as. I probably never read the book from which you got your > definition, because I've NEVER seen R^2 DEFINED as 1-ResSS/TSS). > I'm willing to concede the point, but for the fun of it I pulled four texts from my shelf: Draper & Smith, 2nd: RegSS/TotSS, as Percentage Variation Explained Netter et al., latest ed: R^2 = RegSS/TSS = 1-ResSS/TSS Kleinbaum et al,, latest: (RegSS-ResSS)/TotSS Searle: the square of the cc between observed and predicted! >>If one uses the formal definition of R^2 >>to calculate it for this example, R^2 turns out to be -0.03, which says >>the problem is with the model, not R^2. >This is your ERROR, Jerry. The definition of Multiple R^2 CANNOT lead to a negative value! >I'm not sure what the issue is here. R^2 cannot lead to a negative >>value in the land of sanity and least squares. > Excuse me. Are we discussing statistics in Alice in Wonderland? In this instance, yes! > Then why not tell it in Plain English that R^2 is a mathematical > quantity that CANNOT possibly take on a negative value UNLESS > someone is mangling it by introducing something improper! I mentioned > the economist's use Adjusted R^2 as another example of Quackery. Your follow-up did not clarify or rectify the issue that whatever > the OP did, it was statistical NONSENSE. You might look at it that way. You might also look at it as answering the question, How does this measure work if applied to arbitrary models? and leaving it to the reader to draw his/her own inference about R^2=-0.03. === Subject: Re: r-Squared Question <2005Jul12.102045.6023@jarvis.cs.toronto.edu> >>Rather it is usually defined >>as 1-ResSS/TSS (or RegSS/TSS), > >No. But it's equivalent to the usual RegSS/TotSS because >RegSS + SSE (your ResSS) = TotSS. >>Isn't that what or means, as in 3/6 or 1/2? > > My no was referring to it is usually defined as. > I probably never read the book from which you got your > definition, because I've NEVER seen R^2 DEFINED as 1-ResSS/TSS). I'm willing to concede the point, but for the fun of it I pulled four > texts from my shelf: Draper & Smith, 2nd: RegSS/TotSS, as Percentage Variation Explained So THEY contributed to the misconception and TWO ERRORS (Percentage > Netter et al., latest ed: R^2 = RegSS/TSS = 1-ResSS/TSS I've taught from Neter et al (several editions) and R^2 was always DEFINED as RegSS/TotSS. Yours must've been some Netter. :-) > Kleinbaum et al,, latest: (RegSS-ResSS)/TotSS IMPOSSIBLE! It's WRONG. That's not R^2 at all. I assume it's your copying error. Searle: the square of the cc between observed and predicted! That's a baddy, as a definition. >>If one uses the formal definition of R^2 What formal definition, Jerry? Now that you've listed three (and one typo) from statistics textbooks? >>to calculate it for this example, R^2 turns out to be -0.03, which says >>the problem is with the model, not R^2. > >This is your ERROR, Jerry. >The definition of Multiple R^2 CANNOT lead to a negative value! >>I'm not sure what the issue is here. R^2 cannot lead to a negative >>value in the land of sanity and least squares. > > Excuse me. Are we discussing statistics in Alice in Wonderland? In this instance, yes! Actually Beyond Alice in Wonderland! :-) See above references to Kleinbaum, your formal definition of R^2 and R^2 = -.03. > Then why not tell it in Plain English that R^2 is a mathematical > quantity that CANNOT possibly take on a negative value UNLESS > someone is mangling it by introducing something improper! I mentioned > the economist's use Adjusted R^2 as another example of Quackery. > Your follow-up did not clarify or rectify the issue that whatever > the OP did, it was statistical NONSENSE. You might look at it that way. There's no other valid way to look at it, Jerry. > You might also look at it as answering > the question, How does this measure work if applied to arbitrary > models? and leaving it to the reader to draw his/her own inference > about R^2=-0.03. How does WHAT measure work? There is some weak excuse for using the Searle-like definition to get some correlation, but even Searle's definition would NOT yield a NEGAGIVE number, unless Searle can get an complex number i*sqrt(-0.3) as a correlation. -- Bob. === Subject: Re: r-Squared Question >Netter et al., latest ed: R^2 = RegSS/TSS = 1-ResSS/TSS > I've taught from Neter et al (several editions) and R^2 was > always DEFINED as RegSS/TotSS. Yours must've been some Netter. :-) Need a big net to catch a big fish. I am copying verbatim from the third edition, (the latest is at the office) p 100: Thus SSTO is a measure of uncertainty in predicting Y when X is not considered. Similarly, SSE measures the variation in the Y(i) when a regression model using the independent variable X is employed. A natural measure of the effect of X in reducing the variation in Y, i.e., the uncertainty in predicting Y, is therefore: (3.71) r^2 = (SSTO-SSE)/SSTO = SSR/SSTO = 1-SSE/SSTO Also, p 241: The coefficient of multiple determination, denoted R^2, is defined as follows: (7.35) R^2 = SSR/SSTO = 1 - SSE/SSTO It measures the proportionate reduction of total variation... === Subject: Re: r-Squared Question > Kleinbaum et al,, latest: (RegSS-ResSS)/TotSS Should have been > Kleinbaum et al,, latest: (TotSS-ResSS)/TotSS === Subject: Re: model selection problem > Hello experts, I want to select the optimal distribution from a set of assumable > models. The selection criterion will be AIC or BIC which is based on the > empirical LogLik. Now my problem: I've made experiments with the Log-Gamma-Distribution. This DF in some > cases could be best fit for my problem. But - as I think due to the > log(x) as input - the LogLik for this function is very very small > compared with the LogLik of other DFs fitted to the same dataset (e.g. > Weibull, Lognormal etc). Following a LogLik derived criterion the > LG-Distr should be best fitting in EVERY case. Graphics show me that it > does sometimes, but other times does not. Where is the error, or how could I make the Log-Gamma compareable to the > others? > Carsten P.S.: The LG is the only DF that is not ready-made in my stat-Software > (R). And so I use DFgamma(log(x)) to generate it. Carsten, Log Likelihood incorporates probabilities rather than actual values of x or log(x). Therefore, I cannot see a reason why the Log Likelihood would get smaller or higher just because you take log of x. HTH, Vadim Pliner === Subject: Re: model selection problem > Carsten, Log Likelihood incorporates probabilities rather than actual values of > x or log(x). Therefore, I cannot see a reason why the Log Likelihood > would get smaller or higher just because you take log of x. HTH, > Vadim Pliner Hi Vadim, so to sum up your posting ... a higher likelihood-value should indicate a better fit in any way (at least in theory) ? And an additional question: What I need especially is a good fit in the right tail of the distribution. Are there modifications to likelihood criteria to focus more on tail-exactness? Carsten === Subject: Re: model selection problem Carsten, > Log Likelihood incorporates probabilities rather than actual values of > x or log(x). Therefore, I cannot see a reason why the Log Likelihood > would get smaller or higher just because you take log of x. > HTH, > Vadim Pliner Hi Vadim, so to sum up your posting ... a higher likelihood-value should indicate > a better fit in any way (at least in theory) ? And an additional question: What I need especially is a good fit in the > right tail of the distribution. Are there modifications to likelihood > criteria to focus more on tail-exactness? Carsten ----------- > so to sum up your posting ... a higher likelihood-value should indicate > a better fit in any way (at least in theory) ? Not necessarily. You also have to take into account the DF's number of parameters. The higher the value of log likelihood, the better the distribution function fits the data. However, you cannot simply pick the one yielding the highest likelihood if the distributions you select from have different numbers of parameters. Although a higher likelihood means a better model for the observed data, a higher number of parameters cause weaker predictability for new cases. It is a good idea to use either AIC or BIC criteria which you were going to use anyway. > And an additional question: What I need especially is a good fit in the > right tail of the distribution. Are there modifications to likelihood > criteria to focus more on tail-exactness? I guess you could assign higher weights to the elements of log likelihood corresponding to observations in the right tail (longer lived subjects, if we are talking in the context of survival analysis) and then maximize this weighted log likelihood. Vadim === Subject: Re: model selection problem Carsten, Log Likelihood incorporates probabilities rather than actual values of >x or log(x). Therefore, I cannot see a reason why the Log Likelihood >would get smaller or higher just because you take log of x. HTH, >Vadim Pliner >>Hi Vadim, >>so to sum up your posting ... a higher likelihood-value should indicate >>a better fit in any way (at least in theory) ? >>And an additional question: What I need especially is a good fit in the >>right tail of the distribution. Are there modifications to likelihood >>criteria to focus more on tail-exactness? >>Carsten ----------- >>so to sum up your posting ... a higher likelihood-value should indicate >>a better fit in any way (at least in theory) ? > Not necessarily. You also have to take into account the DF's number of > parameters. The higher the value of log likelihood, the better the > distribution function fits the data. However, you cannot simply pick > the one yielding the highest likelihood if the distributions you select > from have different numbers of parameters. Although a higher likelihood > means a better model for the observed data, a higher number of > parameters cause weaker predictability for new cases. It is a good idea > to use either AIC or BIC criteria which you were going to use anyway. OK. That's my intention. Your text shows me that my initial idea was right. >>And an additional question: What I need especially is a good fit in the >>right tail of the distribution. Are there modifications to likelihood >>criteria to focus more on tail-exactness? > I guess you could assign higher weights to the elements of log > likelihood corresponding to observations in the right tail (longer > lived subjects, if we are talking in the context of survival analysis) > and then maximize this weighted log likelihood. Does somebody here has an idea HOW to weight the likelihood? I think an exponentiation will work fine for the tail exactness, but maybe somebody has more details or literature? Google research didnt help me further :-( Tnx, Carsten === Subject: Re: model selection problem > Hello experts, I want to select the optimal distribution from a set of assumable > models. The selection criterion will be AIC or BIC which is based on the > empirical LogLik. Now my problem: I've made experiments with the Log-Gamma-Distribution. This DF in some > cases could be best fit for my problem. But - as I think due to the > log(x) as input - the LogLik for this function is very very small > compared with the LogLik of other DFs fitted to the same dataset (e.g. > Weibull, Lognormal etc). Following a LogLik derived criterion the > LG-Distr should be best fitting in EVERY case. Graphics show me that it > does sometimes, but other times does not. Where is the error, or how could I make the Log-Gamma compareable to the > others? > Carsten P.S.: The LG is the only DF that is not ready-made in my stat-Software > (R). And so I use DFgamma(log(x)) to generate it. Carsten, Log Likelihood incorporates probabilities rather than actual values of x or log(x). Therefore, I cannot see a reason why the Log Likelihood would get smaller or higher just because you take log of x. HTH, Vadim Pliner === Subject: Hetroskedacity problem Hi all, I am trying to find out whether a Hetroskedacity problem can be tackled by some kind of variable transformation in OLS framework. Sharad