Show simple item record

dc.contributor.authorHammerling, Doriten_US
dc.contributor.authorCefalu, Matthewen_US
dc.contributor.authorCisewski, Jessien_US
dc.contributor.authorDominici, Francescaen_US
dc.contributor.authorParmigiani, Giovannien_US
dc.contributor.authorPaulson, Charlesen_US
dc.contributor.authorSmith, Richard L.en_US
dc.date.accessioned2014-05-06T16:16:46Z
dc.date.issued2014en_US
dc.identifier.citationHammerling, Dorit, Matthew Cefalu, Jessi Cisewski, Francesca Dominici, Giovanni Parmigiani, Charles Paulson, and Richard L. Smith. 2014. “Completing the Results of the 2013 Boston Marathon.” PLoS ONE 9 (4): e93800. doi:10.1371/journal.pone.0093800. http://dx.doi.org/10.1371/journal.pone.0093800.en
dc.identifier.issn1932-6203en
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:12152873
dc.description.abstractThe 2013 Boston marathon was disrupted by two bombs placed near the finish line. The bombs resulted in three deaths and several hundred injuries. Of lesser concern, in the immediate aftermath, was the fact that nearly 6,000 runners failed to finish the race. We were approached by the marathon's organizers, the Boston Athletic Association (BAA), and asked to recommend a procedure for projecting finish times for the runners who could not complete the race. With assistance from the BAA, we created a dataset consisting of all the runners in the 2013 race who reached the halfway point but failed to finish, as well as all runners from the 2010 and 2011 Boston marathons. The data consist of split times from each of the 5 km sections of the course, as well as the final 2.2 km (from 40 km to the finish). The statistical objective is to predict the missing split times for the runners who failed to finish in 2013. We set this problem in the context of the matrix completion problem, examples of which include imputing missing data in DNA microarray experiments, and the Netflix prize problem. We propose five prediction methods and create a validation dataset to measure their performance by mean squared error and other measures. The best method used local regression based on a K-nearest-neighbors algorithm (KNN method), though several other methods produced results of similar quality. We show how the results were used to create projected times for the 2013 runners and discuss potential for future application of the same methodology. We present the whole project as an example of reproducible research, in that we are able to make the full data and all the algorithms we have used publicly available, which may facilitate future research extending the methods or proposing completely different approaches.en
dc.language.isoen_USen
dc.publisherPublic Library of Scienceen
dc.relation.isversionofdoi:10.1371/journal.pone.0093800en
dc.relation.hasversionhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC3984103/pdf/en
dash.licenseLAAen_US
dc.subjectBiology and Life Sciencesen
dc.subjectPsychologyen
dc.subjectBehavioren
dc.subjectHuman Performanceen
dc.subjectComputer and Information Sciencesen
dc.subjectComputing Methodsen
dc.subjectInformation Technologyen
dc.subjectMedicine and Health Sciencesen
dc.subjectSports and Exercise Medicineen
dc.subjectPhysical Sciencesen
dc.subjectMathematicsen
dc.subjectApplied Mathematicsen
dc.subjectAlgorithmsen
dc.subjectProbability Theoryen
dc.subjectStatistical Distributionsen
dc.subjectStatistics (Mathematics)en
dc.subjectBiostatisticsen
dc.subjectStatistical Methodsen
dc.subjectSocial Sciencesen
dc.titleCompleting the Results of the 2013 Boston Marathonen
dc.typeJournal Articleen_US
dc.description.versionVersion of Recorden
dc.relation.journalPLoS ONEen
dash.depositing.authorCefalu, Matthewen_US
dc.date.available2014-05-06T16:16:46Z
dc.identifier.doi10.1371/journal.pone.0093800*
dash.contributor.affiliatedCefalu, Matthew Steven
dash.contributor.affiliatedParmigiani, Giovanni
dash.contributor.affiliatedDominici, Francesca


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record