Early Detection of Injuries in MLB Pitchers from Video

AJ Piergiovanni and Michael S. Ryoo

Department of Computer Science, Indiana University, Bloomington, IN 47408

{ajpiergi,mryoo}@indiana.edu

Abstract

Injuries are a major cost in sports. Teams spend mil-

lions of dollars every year on players who are hurt and

unable to play, resulting in lost games, decreased fan in-

terest and additional wages for replacement players. Mod-

ern convolutional neural networks have been successfully

applied to many video recognition tasks. In this paper, we

introduce the problem of injury detection/prediction in MLB

pitchers and experimentally evaluate the ability of such con-

volutional models to detect and predict injuries in pitches

only from video data. We conduct experiments on a large

dataset of TV broadcast MLB videos of 20 different pitchers

who were injured during the 2017 season. We experimen-

tally evaluate the model’s performance on each individual

pitcher, how well it generalizes to new pitchers, how it per-

forms for various injuries, and how early it can predict or

detect an injury.

1. Introduction

Injuries in sports is a major cost. When a start player is

hurt, not only does the team continue paying the player, but

also impacts the teams performance and fan interest. In the

MLB, teams spend an average of $450 million on players on

the disabled list and an additional $50 million for replace-

ment players each year, an annual average $500 million per

year [5]. In baseball, pitcher injuries are some of the most

costly and common, estimated as high as $250 million per-

year [4], about half the total cost of injuries in the MLB.

As a result, there are many studies on the causes, effects

and recovery times of injuries caused by pitching. Mehdi

et al. [20] studied the duration of lat injuries (back muscle

and tendon area) in pitchers, ﬁnding an average recovery

time of 100 days without surgery and 140 days for pitch-

ers who needed surgery. Marshall et al. [19] found pitchers

with core injuries took an average of 47 days to recover and

37 days for hip/groin injuries. These injuries not only ef-

fect the pitcher, but can also result in the team losing games

and revenue. Pitching is a repetitive action; starting pitch-

ers throw roughly 2500 pitches per-season in games alone

- far more when including warm-ups, practice, and spring

training. Due to such high use, injuries in pitchers are often

caused by overuse [17] and early detection of injuries could

reduce severity and recovery time [12, 11].

Modern computer vision models, such as convolutional

neural networks (CNNs), allow machines to make intelli-

gent decisions directly from visual data. Training a CNN

to accurately detect injuries in pitches from purely video

data would be extremely beneﬁcial to teams and athletes,

as they require no sensors, tests, or monitoring equipment

other than a camera. A CNN trained on videos of pitch-

ers would be able to detect slight changes in their form that

could be a early sign of an injury or even cause an injury.

The use of computer vision to monitor athletes can pro-

vide team physicians, trainers and coaches additional data

to monitor and protect athletes.

CNN models have already been successfully applied to

many video recognition tasks, such as activity recognition

[3], activity detection [27], and recognition of activities in

baseball videos [26]. In this paper, we introduce the prob-

lem of injury detection/prediction in MLB pitchers and ex-

perimentally evaluate the ability of CNN models to detect

and predict injuries in pitches from only video data.

2. Related Work

Video/Activity Recognition Video activity recognition is

a popular research topic in computer vision [1, 14, 33, 35,

30]. Early works focused on hand-crafted features, such

as dense trajectories [35] and showed promising results.

Recently, convolutional neural networks (CNNs) have out-

performed the hand-crafted approaches [3]. A standard

multi-stream CNN approaches takes input of RGB frames

and optical ﬂows [33, 28] or RGB frames at different frame-

rates [8] which are used for classiﬁcation, capturing dif-

ferent features. 3D (spatio-temproal) convolutional models

have been trained for activity recognition tasks [34, 3, 24].

To train these CNN models, large scale datasets such as Ki-

netics [15] and Moments-in-Time [21] have been created.

Other works have explored using CNNs for temporal ac-

tivity detection [24, 32, 6] and studied the use of temporal

structure or pooling for recognition [25, 23]. Current CNNs

arXiv:1904.08916v1 [cs.CV] 18 Apr 2019

are able to perform quite well on a variety of video-based

recognition tasks.

Injury detection and prediction Many works have stud-

ied prediction and prevention of injuries in athletes by de-

veloping models based on simple data (e.g., physical stats

or social environment) [2, 13] or cognitive and psychologi-

cal factors (stress, life support, identity, etc.) [18, 7]. Others

made predictions based on measured strength before a sea-

son [29]. Placing sensors on players to monitor their move-

ments has been used to detect pitching events, but not injury

detection or prediction [16, 22]. Further, sonography (ultra-

sound) of elbows has been used to detect injuries by human

experts [10].

To the best of our knowledge, there is no work exploring

real-time injury detection in game environments. Further,

our approach requires no sensors other than a camera. Our

model makes predictions from only the video data.

3. Data Collection

Modern CNN models require sufﬁcient amount of data

(i.e., samples) for both their training and evaluation. As

pitcher injuries are fairly rare, especially compared to the

number of pitches thrown while not injured, the collection

and preparation of data is extremely important. There is

a necessity to best take advantage of such example videos

while removing non-pitch related bias in the data.

In this work, we consider the task of injury prediction as

a binary classiﬁcation problem. That is, we label a video

clip of a pitch either as ‘healthy’ or ‘injured’. We assume

the last k pitches thrown before a pitcher was placed on the

disabled list to be ‘injured’ pitches. If an injury occurred

during practice or other non-game environment, we do not

include that data in our dataset (as we do not have access

to video data outside of games). We then collect videos of

the TV broadcast of pitchers from several games not near

the date of injury as well as the game they were injured

in. This provides sufﬁcient ‘healthy’ as well as ‘injured’

pitching video data.

The challenge in our dataset construction is that sim-

ply taking the broadcast videos and (temporally) segment-

ing each pitch interval is not sufﬁcient. We found that the

model often overﬁts to the pitch count on the scoreboard,

the teams playing, or the exact pitcher location in the video

(as camera position can slightly vary between ballparks and

games), rather than focusing on the actual pitching motion.

Spatially cropping the pitching region is also insufﬁcient,

as there could be an abundant amount of irrelevant informa-

tion in the spatial bounding box. The model then overﬁts to

the jersey of the pitcher, time of day or weather (based on

brightness, shadows, etc.) or even a fan in a colorful shirt

in the background (see Fig. 1 for examples). While super-

Figure 1. Examples of (top-left) the the pitch count being visible,

(bottom-left) unique colored shirts in the background and (right)

the same pitcher in different uniforms and ballparks. These fea-

tures allow for the model to overﬁt to data that will not generalize

or properly represent whether or not the pitcher is injured.

stitious fans may ﬁnd these factors meaningful, they do not

have any real impact on the pitcher or his injuries.

To address all these challenges, we ﬁrst crop the videos

to a bounding box containing just the pitcher. We then con-

vert the images to greyscale and compute optical ﬂow, as

it capture high-resolution motion information while being

invariant to appearance (i.e., jersey, time of day, etc.). Op-

tical ﬂow has commonly been used for activity detection

tasks [33] and is beneﬁcial as it captures appearance in-

variant motion features [31]. We then use the optical ﬂow

frames as input to a CNN model trained for our binary in-

jury classiﬁcation task. This allows the model to predict a

pitcher’s injury based solely on motion information, ignor-

ing the irrelevant features. Examples of the cropped frames

and optical ﬂows are shown in Fig. 2.

Dataset Our dataset consists of pitches from broadcast

videos for 30 games from the 2017 MLB season. It con-

tains injuries from 20 different pitchers, 4 of which had

multiple injuries in the same season. Each pitcher has an

average of 100 healthy pitches from games not near where

they were injured as well as pitches from the game they in

which they were injured. The data contains 12 left-handed

pitchers and 8 right-handed pitchers, 10 different injuries

(back strain, arm strain, ﬁnger blister, shoulder strain, UCL

tear, intercoastal strain, sternoclavicular joint, rotator cuff,

hamstring strain, and groin strain). There are 5479 pitches

in the dataset, about 273 per-pitcher, providing sufﬁcient

data to train a video CNN. When using k = 20, resulting

in 469 ‘injured’ pitches and 5010 healthy pitches, as some

pitchers threw less than 20 pitches before being injured.

Figure 2. Example of cropped RGB frames and the computed optical ﬂows.

4. Approach

We use a standard 3D spatio-temporal CNN trained on

the optical ﬂow frames. Speciﬁcally, we use I3D [3] with

600 optical ﬂow frames as input with resolution of 460×600

(cropped to just the pitcher from 1920 × 1080 video) from

a 10 second clip of a pitch at 60 fps. We use high frame-

rate and high-resolution inputs to allow the model to learn

the very small differences between ‘healthy’ and ‘injured’

pitches. We initialize I3D with the optical ﬂow stream pre-

trained on the Kinetics dataset [15] to obtain good initial

weights.

We train the model to minimize the binary cross entropy:

L =

log p

+ (1 − y

) log(1 − p

)) (1)

where y

is the label (injured or not) and p

is the models

prediction for sample i. We train for 100 epochs with a

learning rate of 0.1 that is decayed by a factor of 10 every

25 epochs. We use dropout set at 0.5 during training.

5. Experiments

We conduct extensive experiments to determine what the

model (i.e., I3D video CNN) is capable of learning and how

well it generalizes. We compare (1) learning models per-

pitcher and test how well they generalize to other pitchers,

(2) models learned from a set of lefty or righty pitchers, and

(3) models trained on a set of several pitchers. We evaluate

on both seen and unseen pitchers and seen and unseen in-

juries. We also compare models trained on speciﬁc injury

types (e.g. back strain, UCL injury, ﬁnger blisters, etc.) and

analyze how early we can detect an injury solely from video

data.

Since this is a binary classiﬁcation task, as the evaluation

metric, we report:

• Accuracy (correct examples/total examples)

• Precision (correct injured/predicted injured)

• Recall (correct injured/total injured)

• F

scores

Pitcher Acc Prec Rec F

Aaron Nola .92 .50 .34 .42

Clayton Kershaw .96 1. .75 .86

Corey Kluber .95 1. .33 .50

Aroldis Chapman .83 .50 1. .67

Boone Logan .98 .96 .97 .97

Brett Anderson .96 .75 .97 .85

Brandon Finnegan .95 .75 .99 .87

Austin Brice .87 .89 .75 .82

AJ Grifﬁn .91 .98 .33 .50

Adalberto Mejia .96 .83 .78 .81

Average .93 .82 .72 .75

Table 1. Results of predicting a pitcher’s injury where the last 20

pitches thrown are injured. A model was trained for each pitcher.

The model performs well for some pitchers and poorly for others.

where

0.5

recall

precision

(2)

All values are measured between 0 and 1, where 1 is perfect

for the given measure.

5.1. Per-player model

We ﬁrst train a model for each pitcher in the dataset. We

consider the last 20 pitches thrown by a pitcher before being

placed on the disabled list (DL) as ‘injured’ and all other

pitches thrown by the pitcher as healthy. We use half the

‘healthy’ pitches and half the ‘injured’ pitches as training

data and the other half as test data. All the pitchers in the

test dataset were seen during training. In Table 1 we com-

pare the results of our model for 10 different pitchers. For

some pitchers, such as Clayton Kershaw or Boone Logan,

the model was able to accurately detect their injury, while

for other pitchers, such as Aaron Nola, the model was un-

able to reliably detect the injury.

To determine how well the models generalize, we eval-

uate the trained models on a different pitcher. Our results

are shown in Table 2. We ﬁnd that for some pitchers, the

Train Pitcher Test Pitcher Acc Prec Rec F

Liberatore Wainwright .33 .17 .87 .27

Wainwright Liberatore .29 .23 .85 .24

Wood Liberatore .63 0. 0. 0.

Chapman Finnegan .75 .23 .18 .19

Brice Wood .43 .17 .50 .24

Wood Brice .42 .15 .48 .22

Chapman Bailey .23 0. 0. 0.

Mejia Kulber .47 0. 0. 0.

Table 2. Comparing how well a model trained on one pitcher trans-

fers to another pitcher. As throwing motions vary greatly between

pitchers, the models do not generalize too well.

Arm Acc Prec Rec F

Lefty .95 .85 .77 .79

Righty .94 .81 .74 .74

All pitchers .91 .75 .73 .74

Table 3. Classiﬁcation of pitcher injury trained on the 12 left

handed or 8 right handed pitchers and evaluated on held-out data.

transfer works reasonable well, such as Libertore and Wain-

wright or Brice and Wood. However, for other pitchers, it

does not generalize at all. This is not surprising, as pitchers

have various throwing motions, use different arms, etc., in

fact, it is quite interesting that it generalizes at all.

5.2. By pitching arm

To further examine how well the model generalizes, we

train the model on the 12 left handed (or 8 right handed)

pitchers, half the data is used for training and the other half

of held-put pitches is used for evaluation. This allows us to

determine if the model is able to learn injuries and throw-

ing motions for multiple pitchers or if it must be speciﬁc

to each pitcher. Here, all the test data is of pitchers seen

during training. We also train a model on all 20 pitchers

and test on held-out pitches. Our results are shown in Ta-

ble 3. We ﬁnd that these models perform similarly to the

per-pitcher model, suggesting that the model is able to learn

multiple pitchers’ motions. Training on all pitchers does

not improve performance, likely since left handed and right

handed pitchers have very different throwing motions.

Unseen pitchers In Table 4 we report results for a model

trained on 6 left handed (or 4 right handed) pitchers and

tested on the other 6 left handed (or 4 right handed) pitch-

ers not seen during training. For these experiments, the

last 20 pitches thrown were considered ‘injured.’ We ﬁnd

that when training with more pitcher data, the model gen-

eralizes better than when transferring from a single pitcher,

but still performs quite poorly. Further, training on both

left handed and right handed pitchers reduces performance.

Arm Acc Prec Rec F

Lefty .42 .25 .62 .43

Righty .38 .22 .54 .38

All pitchers .58 .28 .43 .35

Table 4. Classiﬁcation of pitcher injury trained on the 6 left handed

or 4 right handed pitchers and evaluated on the other 6 left handed

or 4 right handed pitcher.

Arm Acc Prec Rec F

Lefty .67 .62 .65 .63

Righty .71 .65 .68 .67

All pitchers .82 .69 .72 .71

Table 5. Classiﬁcation performance of our model when trained us-

ing ‘healthy’ + ‘injured’ pitch data from half of the pitchers and

some ‘healthy’ pitches from the other half. The model was applied

to unseen pitches from the other half of the pitchers to measure the

performance. This conﬁrms that the model can generalize to un-

seen pitcher injuries using only ‘healthy’ examples.

This suggests that models will be unable to predict injuries

for pitchers they have not seen before, and left handed and

right handed pitchers should be treated separately.

To determine if a model needs to see an speciﬁc pitcher

injured before it can detect that pitchers injury, we train

a model with ‘healthy’ and ‘injured’ pitches from 6 left

handed pitchers (4 right handed), and only ‘healthy’ pitches

from the other 6 left handed (4 right handed) pitchers. We

use half of the unseen pitchers ‘healthy’ pitches as train-

ing data and all 20 unseen ‘injured’ plus the other half of

the unseen ‘healthy’ pitches as testing data. Our results are

shown in Table 5, conﬁrming that training in this method

generalizes to the unseen pitcher injuries, nearly matching

the performance of the models trained on all the pitchers

(Table 3). This suggests that the models can predict pitcher

injuries even without seeing a speciﬁc pitcher with an in-

jury.

Lefty vs Righty Models To further test how well the

model generalizes, we evaluate the model trained on left

handed pitchers on the right handed pitchers, and similarly

the right handed model on left handed pitchers. We also try

horizontally ﬂipping the input images, effectively making

a left handed pitcher appear as a right handed pitcher (and

vice versa). Our results, shown in Table 6, show that the

learned models do not generalize to pitches throwing with

the other arm, but by ﬂipping the image, the models gener-

alize signiﬁcantly better, giving comparable performance to

unseen pitchers (Table 4). By additionally including ﬂipped

‘healthy’ pitches of the unseen pitchers, we can further im-

prove performance. This suggests that ﬂipping an image

is sufﬁcient to match the learned motion information of an

injured pitcher throwing with the other arm.

Arm Acc Prec Rec F

Left-to-Right .22 .05 .02 .03

Right-to-Left .14 .07 .05 .05

Left-to-Right + Flip .27 .35 .42 .38

Right-to-Left + Flip .34 .38 .48 .44

Left-to-Right + Flip + ‘Healthy’ .57 .54 .57 .56

Right-to-Left + Flip + ‘Healthy’ .62 .56 .55 .56

Table 6. Classiﬁcation of the left-handed model tested on right

handed pitchers and the right handed model tested on left handed

pitchers. We also test horizontal ﬂipping the images, which makes

a left handed pitcher appear as a right handed pitcher (and vice

versa).

5.3. Analysis of Injury Type

We can further analyze the models performance on spe-

ciﬁc injuries. The 10 injuries in our dataset are: back strain,

arm strain, ﬁnger blister, shoulder strain, UCL tear, inter-

coastal strain, sternoclavicular joint, rotator cuff, hamstring

strain, and groin strain. For this experiment, we train a

separate model for left-handed and right-handed pitchers,

then compare the evaluation metrics for each injury for each

throwing arm. We use half the pitchers for training data plus

half the ‘healthy’ pitches from the other pitchers. We eval-

uate on the unseen ‘injured’ pitches and other half of the

unseen ‘healthy’ pitches.

In Table 7, we show our results. Our model performs

quite well for most injuries, especially hamstring and back

injuries. These likely lead to the most noticeable changes in

a pitchers motion, allowing the model to more easily deter-

mine if a pitcher is hurt. For some injuries, like ﬁnger blis-

ters, our model performs quite poorly in detecting. Pitchers

likely do not signiﬁcantly change their motion due to a ﬁn-

ger blister, as only the ﬁnger is affected.

5.4. How early can an injury be detected?

Following the best setting, we use half the pitchers plus

half of the ‘healthy’ pitches of the remaining pitchers as

training data and evaluate on the remaining data (i.e., the

setting used for Table 5). We vary k, the number of pitches

thrown before being placed on the disabled list to determine

how early before injury the model can detect an injury. In

Table 8, we show our results. The models performs best

when given 10-30 ‘injured’ samples, and produces poor re-

sults when the last 50 or more pitches are labeled as ‘in-

jured.’ This suggests that 10-30 samples are enough to train

the model while still containing sufﬁciently different mo-

tion patterns related to an injury. When using the last 50 or

more pitches, the injury has not yet signiﬁcantly impacted

the pitchers throwing motion.

Figure 3. Visualization using the method from [9] to visualize

an ‘injured’ pitch. This does not give an interpretable image of

why the decision was made, but shows that the model is capturing

spatio-temporal pitching motions.

6. Evaluating the Bias in the Dataset

To conﬁrm that our model is not ﬁtting to game-speciﬁc

data and that such game-speciﬁc information is not present

in our optical ﬂow input, we train an independent CNN

model to predict which game a given pitch is from. The

results, shown in Table 9, show that when given cropped op-

tical ﬂow as input, the model is unable to determine which

game a pitch is from, but is able to when given RGB fea-

tures. This conﬁrms both that our cropped ﬂow is a good

input and that the model is not ﬁtting to game speciﬁc data.

We further analyze the model to conﬁrm that our input

does not suffer from temporal bias, by trying to predict the

temporal ordering of pitches. Here, we give the model two

pitches as input, and it must predict if the ﬁrst pitch occurs

before or after the second pitch. We only train this model

on pitches from games where there was no injury to conﬁrm

that the model is ﬁtting to injury related motions, and not

some other temporal feature. The results are shown in Table

10 and we ﬁnd that the model is unable to predict temporal

ordering of pitches. This suggests that the model is ﬁtting

to actual injury related motion, and not some other temporal

feature.

7. Discussion and Conclusions

We introduced the problem of detecting and predicting

injuries in pitchers from only video data. However, there

are many possible limitations and extensions to our work.

While we showed that CNN can reliably detect and pre-

dicted injuries, due to the somewhat limited size of our

dataset and scarcity of injury data in general, it is not clear

exactly how well this will generalize to all pitchers, or pitch-

Lefty Righty

Injury Acc Prec Rec F

Acc Prec Rec F

Back Strain .95 .67 .74 .71 .95 .71 .76 .73

Finger Blister .64 .06 .02 .05 .64 .03 .01 .02

Shoulder Strain .94 .82 .89 .85 .96 .95 .92 .94

UCL Tear .92 .74 .72 .74 .94 .71 .67 .69

Intercostal Strain .94 .84 .87 .86 .92 .82 .84 .83

Sternoclavicular .92 .64 .68 .65 .93 .65 .67 .66

Rotator Cuff .86 .58 .61 .59 .86 .57 .60 .59

Hamstring Strain .98 .89 .92 .91 .99 .93 .95 .94

Groin Strain .93 .85 .83 .84 .92 .85 .84 .86

Table 7. Evaluation of the model for each injury. The model is able to detect some injuries well (e.g., hamstring strains) and others quite

poorly (e.g., ﬁnger blisters).

k = 10 k = 20 k = 30 k = 50 k = 75

Lefty .68 .63 .65 .48 .47

Righty .64 .67 .69 .52 .44

Table 8. F

score for models trained to predict the last k pitches as ‘injured.’ The model is able to produce most accurate predictions where

the last 10 to 30 pitchers are labeled as ‘injured’ but performs quite poorly for greater than 50 pitches.

ers at different levels of baseball (e.g., high school pitchers

throw much more slowly than the professionals). While op-

tical ﬂow provides a reasonable input feature, it does lose

some detail information which could be beneﬁcial for injury

detection. The use of higher resolution and higher frame-

rate data could further improve performance. Further, since

our method is based on CNNs, it is extremely difﬁcult to

determine why or how a decision is made. We applied the

visualization method from Feichtenhofer et al. [9] to our

model and data to try to interpret why a certain pitch was

classiﬁed as an injury. However, this just provided a rough

visualization over the pitchers throwing motion, providing

no real insight into the decision. We show an example visu-

alization in Fig. 3. It conﬁrms the model is capturing spatio-

temporal pitching motions, but does not explain why or how

the model detects injuries. This is perhaps the largest limi-

tation of our work (and CNN-based methods in general), as

just a classiﬁcation score is very limited information for the

athletes and trainers.

As many injuries in pitchers are due to overuse, repre-

senting an injury as a sequence of pitches could be beneﬁ-

cial, rather than treating each pitch as an individual event.

This would allow for models to detect changes in motion or

form over time, leading to better predictions and possibly

more interpretable decisions. However, training such se-

quential models would require far more injury data to learn

from, as 10-20 samples would not be enough. The use of

additional data, both ‘healthy’ and ‘injured’ would further

improve performance. Determining the optimal inputs and

designing of models speciﬁc to baseball/pitcher data could

further help.

Finally, determining how early and injury would have to

be detected/predicted to actually reduce recovery time re-

mains unknown.

In conclusion, we proposed a new problem of detect-

ing/predicting injuries in pitchers from only video data. We

extensively evaluated the approach to determine how well it

performs and generalizes for various pitchers, injuries, and

how early reliable detection can be done.

References

[1] J. K. Aggarwal and M. S. Ryoo. Human activity analysis:

A review. ACM Computing Surveys, 43:16:1–16:43, April

2011. 1

[2] M. B. Andersen and J. M. Williams. A model of stress and

athletic injury: Prediction and prevention. Journal of sport

and exercise psychology, 10(3):294–306, 1988. 2

[3] J. Carreira and A. Zisserman. Quo vadis, action recognition?

a new model and the kinetics dataset. In Proceedings of the

IEEE Conference on Computer Vision and Pattern Recogni-

tion (CVPR), 2017. 1, 3

[4] H. Cole. Baseball loses 1.1 billion to pitching injuries over

ﬁve-year period, March 2015. [Online]. 1

[5] S. Conte, C. L. Camp, and J. S. Dines. Injury trends in major

league baseball over 18 seasons: 1998-2015. Am J Orthop,

45(3):116–123, 2016. 1

[6] A. Dave, O. Russakovsky, and D. Ramanan. Predictive-

corrective networks for action detection. arXiv preprint

arXiv:1704.03615, 2017. 1

[7] D. L. Falkstein. Prediction of athletic injury and postin-

jury emotional response in collegiate athletes: A prospec-

tive study of an NCAA Division I football team. PhD thesis,

University of North Texas, 1999. 2

Pitcher Guess RGB Cropped RGB Flow Cropped Flow

Boone Logan 0.44 0.97 0.95 0.76 0.47

Clayton Kershaw 0.44 0.94 0.85 0.73 0.45

Adalberto Mejia 0.54 0.98 0.78 0.55 0.55

Aroldis Chapman 0.25 0.86 0.74 0.57 0.26

Table 9. Accuracy predicting which game a pitch is from using different input features. Having a lower value means that the input data

has less bias, which is better for the injury detection. The model is able to accurately determine the game using RGB features. However,

using the cropped ﬂow, it provides nearly random guess performance, conﬁrming that when using cropped ﬂow, the model is not ﬁtting

to game-speciﬁc details. Note that random guessing varies depending on how many games (and pitches per-game) are in the dataset for a

given pitcher.

Pitcher Accuracy

Boone Logan 0.49

Clayton Kershaw 0.53

Adalberto Mejia 0.54

Aroldis Chapman 0.51

All Average 0.51

Table 10. Predicting if a pitch occurs before or after another pitch,

random guess is 50%. We used only pitches from games where

no injury occurred to determine if the model was able to ﬁnd any

temporal relationship between the pitches. Ideally, this accuracy

would be 0.5, meaning that the pitch ordering is random. We ﬁnd

the model is not able to predict the ordering of pitches, suggesting

that it is ﬁtting to actual different motions caused by injury and not

unrelated temporal data.

[8] C. Feichtenhofer, H. Fan, J. Malik, and K. He. Slow-

fast networks for video recognition. arXiv preprint

arXiv:1812.03982, 2018. 1

[9] C. Feichtenhofer, A. Pinz, R. P. Wildes, and A. Zisserman.

What have we learned from deep representations for ac-

tion recognition? In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, pages 7844–

7853, 2018. 5, 6

[10] M. Harada, M. Takahara, J. Sasaki, N. Mura, T. Ito, and

T. Ogino. Using sonography for the early detection of elbow

injuries among young baseball players. American Journal of

Roentgenology, 187(6):1436–1441, 2006. 2

[11] A. C. Hergenroeder. Prevention of sports injuries. Pediatrics,

101(6):1057–1063, 1998. 1

[12] A. Hreljac. Etiology, prevention, and early intervention of

overuse injuries in runners: a biomechanical perspective.

Physical Medicine and Rehabilitation Clinics, 16(3):651–

667, 2005. 1

[13] A. Ivarsson, U. Johnson, M. B. Andersen, U. Tranaeus,

A. Stenling, and M. Lindwall. Psychosocial factors and sport

injuries: meta-analyses for prediction and prevention. Sports

medicine, 47(2):353–365, 2017. 2

[14] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar,

and L. Fei-Fei. Large-scale video classiﬁcation with convo-

lutional neural networks. In Proceedings of the IEEE Confer-

ence on Computer Vision and Pattern Recognition (CVPR),

pages 1725–1732, 2014. 1

[15] W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vi-

jayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, et al.

The kinetics human action video dataset. arXiv preprint

arXiv:1705.06950, 2017. 1, 3

[16] M. Lapinski, E. Berkson, T. Gill, M. Reinold, and J. A. Par-

adiso. A distributed wearable, wireless sensor system for

evaluating professional baseball pitchers and batters. In 2009

International Symposium on Wearable Computers, pages

131–138. IEEE, 2009. 2

[17] S. Lyman, G. S. Fleisig, J. R. Andrews, and E. D. Osinski.

Effect of pitch type, pitch count, and pitching mechanics on

risk of elbow and shoulder pain in youth baseball pitchers.

The American journal of sports medicine, 30(4), 2002. 1

[18] R. Maddison and H. Prapavessis. A psychological approach

to the prediction and prevention of athletic injury. Journal of

Sport and Exercise Psychology, 27(3):289–310, 2005. 2

[19] N. E. Marshall, T. R. Jildeh, K. R. Okoroha, A. Patel,

V. Moutzouros, and E. C. Makhni. Implications of core and

hip injuries on major league baseball pitchers on the disabled

list. Arthroscopy: The Journal of Arthroscopic & Related

Surgery, 34(2):473–478, 2018. 1

[20] S. K. Mehdi, S. J. Frangiamore, and M. S. Schickendantz.

Latissimus dorsi and teres major injuries in major league

baseball pitchers: a systematic review. American Journal

of Orthopedics, 45(3):163–167, 2016. 1

[21] M. Monfort, A. Andonian, B. Zhou, K. Ramakrishnan, S. A.

Bargal, Y. Yan, L. Brown, Q. Fan, D. Gutfreund, C. Von-

drick, et al. Moments in time dataset: one million videos for

event understanding. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 2019. 1

[22] N. B. Murray, G. M. Black, R. J. Whiteley, P. Gahan, M. H.

Cole, A. Utting, and T. J. Gabbett. Automatic detection of

pitching and throwing events in baseball with inertial mea-

surement sensors. International journal of sports physiology

and performance, 12(4):533–537, 2017. 2

[23] J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan,

O. Vinyals, R. Monga, and G. Toderici. Beyond short snip-

pets: Deep networks for video classiﬁcation. In Proceed-

ings of the IEEE Conference on Computer Vision and Pattern

Recognition (CVPR), pages 4694–4702. IEEE, 2015. 1

[24] A. Piergiovanni, A. Angelova, A. Toshev, and M. S. Ryoo.

Evolving space-time neural architectures for videos. arXiv

preprint arXiv:1811.10636, 2018. 1

[25] A. Piergiovanni, C. Fan, and M. S. Ryoo. Learning latent

sub-events in activity videos using temporal attention ﬁlters.

In Proceedings of the American Association for Artiﬁcial In-

telligence (AAAI), 2017. 1

[26] A. Piergiovanni and M. S. Ryoo. Fine-grained activity recog-

nition in baseball videos. In CVPR Workshop on Computer

Vision in Sports, 2018. 1

[27] A. Piergiovanni and M. S. Ryoo. Learning latent super-

events to detect multiple activities in videos. In Proceed-

ings of the IEEE Conference on Computer Vision and Pattern

Recognition (CVPR), 2018. 1

[28] A. Piergiovanni and M. S. Ryoo. Representation ﬂow for

action recognition. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, 2019. 1

[29] M. Pontillo, B. A. Spinelli, and B. J. Sennett. Prediction of

in-season shoulder injury from preseason testing in division

i collegiate football players. Sports Health, 6(6):497–503,

2014. 2

[30] M. S. Ryoo and L. Matthies. First-person activity recogni-

tion: What are they doing to me? In Proceedings of the

IEEE Conference on Computer Vision and Pattern Recogni-

tion (CVPR), 2013. 1

[31] L. Sevilla-Lara, Y. Liao, F. G

uney, V. Jampani, A. Geiger,

and M. J. Black. On the integration of optical ﬂow and action

recognition. In German Conference on Pattern Recognition,

pages 281–297, 2018. 2

[32] Z. Shou, J. Chan, A. Zareian, K. Miyazawa, and S.-F. Chang.

Cdc: Convolutional-de-convolutional networks for precise

temporal action localization in untrimmed videos. arXiv

preprint arXiv:1703.01515, 2017. 1

[33] K. Simonyan and A. Zisserman. Two-stream convolutional

networks for action recognition in videos. In Advances in

Neural Information Processing Systems (NIPS), pages 568–

576, 2014. 1, 2

[34] D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and

M. Paluri. C3d: generic features for video analysis. CoRR,

abs/1412.0767, 2(7):8, 2014. 1

[35] H. Wang, A. Kl

aser, C. Schmid, and C.-L. Liu. Action

recognition by dense trajectories. In Proceedings of the

IEEE Conference on Computer Vision and Pattern Recog-

nition (CVPR), pages 3169–3176. IEEE, 2011. 1