Originally Posted by
conalthomas
Well, actually, my point is that comps aren't likely to work, so the last thing I would want to do is come up with some limited sample set of my own that does no better of a job of predicting future outcomes. And, I contend that there is value brought to the discussion; my argument is that predicting a career trajectory of a player like Swanson based on comparisons with the career trajectories of six other players that are similar with regard to some particular parameters has quite limited utility. I mean look at the comparisons you describe; you have a spread of career WAR from some negative value all the way up to 31.4 in just six samples. What would it look like if you had a hundred samples? Having looked at lots and lots of data sets (not to do with baseball), my very strong feeling is that the spread would be even greater with more data, and I know you really can't make any kind of conclusion about the shape of the distribution with just six samples. This data says to me; OK, Swanson could have a career WAR somewhere between some negative value and probably something higher than 31.4 (because what are the chances you have captured the true range of possibilities with 6 samples?), possibly quite a bit higher. And, with just six samples, we certainly can't conclude with this data set that a low career WAR like 2 or 6 or 8 is any more likely than a very different career WAR of 20-30 (or more).
What I was really hoping you'd say (in my perfect version of the world) is something like, 'I can understand your skepticism, but the method of using comps like this has proven to be quite successful in making future projections.', along with a link to an analysis that showed this. I am truly skeptical that the data set you provided has much in the way of predictive utility. I can absolutely promise you that in professional data analysis circles, predictive statements based on a data set like this would be laughed at. But, here's the thing, I don't spend near as much time looking at baseball stats as you do. I could do an analysis of the predictive utility of comps, and even do an analysis of variance that would let us weight the various parameters used in comparisons, but I don't want to do that. I'm hoping that someone has already done it. I see comps used a lot in baseball, and what I'm hoping is that you know of something that indicates that they actually mean anything.