MLB TweetBot

I like when pitchers get a hit.  Can I make a Tweet Bot to tweet out any time this happens?

I am a big fan of pitchers getting hits.  In theory, they are so bad at this (in general) that half the league doesn’t even make them pick up a bat and try.  It always brings a smile to my face when a pitcher gets a hit.  TAKE THAT AL!

Can I do this?

Easy!  There are really only 3 main parts:

  1. Pull in real time MLB pitch by pitch data
  2. Check each at bat for the end result, and the current batter
  3. Tweet something to get the message across

Pulling in MLB data was way easier than I expected.  Panzarino posted a repo on Github that pulls a massive amount of data for any game by reading MLB Gameday XML data.  His code can pull be used to pull seasons of data, but I am only checking the current day.  A tweet bot that tweets past day isn’t exactly useful.

Using the infrastructure he shared, I can loop through all the games scheduled for the current day, and loop through each at bat of every game.  That at bat contains each pitch and the result of that pitch, which tells me if there was a hit or not.  The pitch at bat information also holds the MLB’s Player ID (which you can see on their profiles on MLB.com) but not the batter’s name or position.

To identify the batter, I needed to connect the batter’s Player ID to their position somehow.  After some good ol’ fashioned Googling, I found CaptainCrunch‘s website, which has a CSV file that connects that Player ID to the player’s name and position.  His CSV is regularly updated with new players, it does exactly what I need, and it is free to use.  Perfect!

Combining the MLB XML data and the player information CSV file, I can see find all hits in the game, who made the hit, and I can lookup what position that player normally plays.

Tweeting something out is so easy that even the president finds time to send a few in his down time.  Automatically tweeting something is even easier, as tons of people have put up guides on how to set it up.  Following one of those guides, I made an account and made an app (to get to an access token and secret).  After learning about OAuth, the python to tweet something is a cake walk:

[pastacode lang=”python” manual=”import%20tweepy%0Aauth%20%3D%20tweepy.OAuthHandler(consumer_key%2C%20consumer_secret)%0Aauth.set_access_token(access_token%2C%20access_token_secret)%0Aapi%20%3D%20tweepy.API(auth)%0Atweet%20%3D%20api.update_status(status%3DTextToTweet)” message=”” highlight=”” provider=”manual”/]

What should I do with it?

Automate it!  MLB games don’t start before 11am CT, and they usually wrap up before midnight.  Baseball games are fairly slow, and I can accept a slight delay from when the hit takes place to when the tweet goes out.  I have a Unix server, so I can set my code to execute at 11am, and loop through all games every 5 minutes to check for hits.  This would retweet hits every 5 minutes, so I keep a text file of each hit and check to make sure that I only tweet each one once.

Something worth doing is worth over doing.  Why just tweet when pitchers get hits when I could tweet more!  A friend suggested that in addition to hits by pitchers, I should also tweet hit by pitches, which was even simpler than hits by pitchers!

This started to become confusing, so naturally I made it worse, by also identifying pitches by hitters (aka, non-pitchers).  I briefly considered pitchers hit by pitches, and hitters getting hits on hitters, but I am saving those for another day.  For now, I am going to leave it at:

  1. Hits by Pitcher
  2. Hit by Pitches
  3. Pitches by Hitters

I am hoping these tweets can entertain people, but I don’t want to tweet so much that I begin to annoy people.  I compiled a list of all 3 of these events over the course of the first half of the season, and found that I will tweet 15 or fewer times on 75% of days that I run this:

In addition to the daily, real time tweets, I also wrote a second script to tweet out the weekly count of the 3 events.  I think enough people out there would appreciate seeing this, so might as well share what I have been able to do.  If the Tweet Bot gets enough followers, there may be an opportunity to monetize it, and it is usually worth keeping those kinds of options open.

What did I learn?

In an average game, there are 0.20 Hits by Pitchers, 0.58 Hit by Pitches, and 0.01 Pitches by Hitters.  Over the course of the 162 game season, each team should have a pitcher get a hit 33 times (not accounting for the NL vs AL rules), 94 players get hit by a pitch, and have a non-pitcher throw at least one pitch twice.

Real time and weekly tweets require two different scripts, but a lot of the same user defined functions (like one to tweet a text string) and Twitter credentials.  I didn’t want to have to change both files each time I tweak some code, so I learned how to create my own python module that both scripts can use.  By having each script reference my module, I drastically simplified each script and made it easier to update both in the future.  If I need to change Twitter profiles, a hashtag, or many other parts of the code, I can change it in the module and both scripts will use the changes.

Be sure to follow the MLB Tweetbots here:

PitchHitting – Tweets weekly summary

HitByPitches – Tweets when a hitter gets hit by a pitch

HitsByPitchers – Tweets when a pitcher gets a hit

PitchByHitter – Tweets when a hitter throws a pitch

 

For a sneak preview, here is @PitchHitting: