Wednesday, July 08, 2009

Predicting Baseball Injuries

This piece in the Times about the Dodgers trying to apply statistics to predicting player injuries is interesting, but I have some issues with the approach.

The example given--the trainer warns the team about signing a reliever with two 80+ appearance seasons in a row, the reliever signs with another team where he ends up breaking down the next season--is not that remarkable. One wouldn't need any "logarithmic models" (the piece uses the term) to predict that wear and tear causes injuries.

In a very real sense, the probability of an injury in baseball is 1.00. Maybe .99, to account for a few Men of Steel and Horses of Iron. So what you are really interested in is survival, i.e., how long a given player will go until serious injury. That's why the article talks about insurance, I'm guessing. This is like life insurance. Replace injury with death, and that's what the insurance companies want to know.

But I am skeptical that many of the factors listed in the article are related to injury. The article mentions ethnic background, hypothesizing that certain Latin nationals are more durable. I doubt that this holds up after controlling for other factors.

My guess is that you could get pretty far with (1) player's age; (2) number of previous trips to the DL; and (3) games played last season (or last two seasons, I'd want to try both). I wonder how well that model would work? Maybe add player position, with certain positions more likely to lead to injury (?).

Also, note that many baseball injuries are, shall we say, random events. Crashing into the wall, getting hit with a pitch, a pitcher taking a line drive off his throwing arm . . . these are always going to be in the residual. There's no way to predict these--unless they are closely enough associated with certain positions that you can estimate the probability that an outfielder, say, will collide with a fellow player? Maybe.

There may also be a baseline probability for all players (batters) for getting seriously injured being hit by pitch. In fact, there must be, right? But it would be a constant (more or less?). A function of at bats?

I need to think about this some more.


