Fezmid Posted September 13, 2011 Posted September 13, 2011 Has anybody found a source for NFL play by play data in XML format? I've found a couple of sites that seem to sell that data, but I was hoping for free (even if it's years old). Not sure if anybody here has come across anything like that or not. The reason I ask is that I'm preparing to build a data warehouse of all play by play data so that you can ask interesting questions like, "How often do the Bills run the ball on 3rd and less than 2 on the road when trailing by 14 points in the snow?" Ambitious, but we'll see what happens. Thanks! (EDIT: I did find this site, and it contains plays in CSV. Might be able to use it, but I think XML fits into the project better). http://www.armchairanalysis.com/nfl-play-by-play-data.php
In-A-Gadda-Levitre Posted September 13, 2011 Posted September 13, 2011 Has anybody found a source for NFL play by play data in XML format? I've found a couple of sites that seem to sell that data, but I was hoping for free (even if it's years old). Not sure if anybody here has come across anything like that or not. The reason I ask is that I'm preparing to build a data warehouse of all play by play data so that you can ask interesting questions like, "How often do the Bills run the ball on 3rd and less than 2 on the road when trailing by 14 points in the snow?" Ambitious, but we'll see what happens. Thanks! you can get free 2000-2010 Play-by-Play here in CSV format. There's a number of CSV-to-XML converters available like this one
CodeMonkey Posted September 13, 2011 Posted September 13, 2011 FYI that site that has the CSV is flagged as a gambling site for those of you (like me) that were thinking of accessing it from work
Fezmid Posted September 13, 2011 Author Posted September 13, 2011 The NFL has agreed to give me a week's worth a data, although I'm asking if they can give me 16 weeks worth of an individual team's data instead of one week of everyone's data. We'll see what they say, but I was impressed that the NFL wrote me back.
Fezmid Posted October 11, 2011 Author Posted October 11, 2011 Well, the NFL sent me a season of data in XML for the Panthers... Unfortunately they just sent it a couple of days ago so I didn't have time to use it in my project. The good news is that I used the data from ArmChairAnalysis.com and was able to get my data warehouse up and running with data from every game back to the year 2000! Now to come up with cool queries for it. For example, did you know that since 2000, the Bills have run the ball 55.28% of the time on the first play of a drive? However if you only take home games into account, that number rises to 58.09%, and is only 53.61% on the road. I guess I would've thought they'd run the ball more on the road to start a drive, but I guess not. I can't wait to play around with this data. I have 473,621 plays in this database across 2,921 games!
Big Turk Posted October 11, 2011 Posted October 11, 2011 I actually programmed a PBP parser for a football game I was developing that sort of did the same thing...
Fezmid Posted October 11, 2011 Author Posted October 11, 2011 I actually programmed a PBP parser for a football game I was developing that sort of did the same thing... Initially that's how I wanted to get my data -- parse through the PBP logs of NFL.com. However what they publish on their site isn't in any sort of standard, so parsing would be very difficult and rife with errors. I have a decent data set now though -- can't wait to play with it.
Big Turk Posted October 11, 2011 Posted October 11, 2011 (edited) I copied them to doc files, standardized the language and made use of a lot of Regex.. Works perfectly fine, pulled all pertinent data and stuffed it into an SQL DB, and then ran all kindsa various sorts and stored procedures. Edited October 11, 2011 by matter2003
Recommended Posts