Top Banner
The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University
23

The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Mar 27, 2015

Download

Documents

Justin Reyes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

The Phystat Repository

For Physics Statistics Code

M. Fischler, J. Linnemann, M. Paterno, P. Canal

phystat.org

Samsi, March 7, 2006 Duke University

Page 2: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

The phystat.org Repository

• A broadly accessible collection of– Tools and utilities– Modules and Libraries– Code fragments and technical documentation

Pertaining to statistics used in physics• Idea emerged as an adjunct of the

PHYSTAT Conferences on Statistical problems in Particle Physics, Astrophysics, and Cosmology– Small workshop held in August at FNAL

Page 3: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Observations at PHYSTAT and at the Workshop:

• Many of the papers presented at PHYSTAT05 (Oxford) and 03 (SLAC) would benefit from a common place to cite code and technical expositions concerning statistics techniques– Citing a package for more detail about what

was done in a physics publication is a primary motivator for the Phystat repository

• Many of the participants have code modules and tools which they would like to make more readily available to the physics community

Page 4: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

The Useful Statistics Repository Would Contain

• Tools and utilities– Useful stand-alone packages

• Modules and Libraries– Working code intended as building blocks for others’

programs

• Major Integrated Toolsets• Code fragments

– Illustrating the precise statistical algorithms applied to major experiment’s analyses

– Not necessarily intended to run intact outside their original environment

• Technical documentation of statistical algorithms– Perhaps more detailed than would be appropriate for archival

journal papers

Page 5: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Does Such a Repository Have To Be Created?

• Existing arXiv-style repositories – Are not a place for code and libraries

• Existing code repositories (e.g., SourceForge, R Project)– Would not be appropriate for code

fragments or expositions documenting experiments’ algorithms

– Physics Statistics code would get lost in the mass of packages

• Code collections by individual physicists– Continuity issues: Will it be there in 10 yrs?

Page 6: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

The Phystat Repository Strategy• Institutional responsibility is key

– To ensure that archived material will remain available over time

– Assigned package numbers (e.g., PHYSTAT/0603-001/v2) will be suitable for use as citations, without concern that they will become invalid

• We should be as inclusive as possible– No restrictions based on which platforms or languages a

package works with– No acceptance/refereeing wrestling– The broadest possible acceptance of licensing approaches

• Don’t be too ambitious– The repository content will come from the community, not

from the repository maintainers

Page 7: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

phystat.org

• Universal download access– Sophisticated search and browsing aids– Multi-view classification of contents

• Mildly moderated content submission– As unrestrictive as possible

• Support for value added– User comments– Validation and endorsement

• FNAL Computing Division commitment– Support for site mechanism, archival storage, and

content moderation

Page 8: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Intended Scope of the Repository

• Hypothesis testing– Model comparison– Classical and Bayesian tests

• Fitting/parameter estimation• Limit setting• Categorization

– Decision tree, Neural Net, …• Random Distribution Generation• {Your suggestions here}

– E.g., if people feel Phystat is a good place to share tracking algorithms, it can be flexible

Page 9: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Using the phystat.org Repository

• www.phystat.org – organized using Plone• Main page has:

– How-to instructions (and links) for• Finding packages• Submitting/modifying a package• Commenting, validating, and so forth• Links to all the PHYSTAT conferences• Links to related web resources

– Navigation to each type of package– Search tools

Page 10: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Phystat.org

Page 11: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.
Page 12: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Using the phystat.org Repository

• Navigation leads to several types of page:– Package lists

• Created dynamically as result of searches or selection of categories of packages

• Contain names, one-line descriptions

– Package pages• Full description of one package• Download button

– Submit-a-package form• Fields for descriptions, uploads

Page 13: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Using the phystat.org Repository• Searches by

– Category• Executable utility, Library, Code Fragment, Root macro…

– Language• C++, R, Python, Fortran, …

– Purpose• Fitting, categorization, hypothesis testing

– Keywords

• Package pages– Description– Download

• Multiple versions allowed

– User discussion– Validation links

Page 14: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Submitting Content• The author should prepare:

– A package name– A one-line description (suitable for reading in lists of

packages)– A full description (a paragraph suitable to let users

decide whether to download)– Tarball containing

• Code (if applicable)

• Build tools (if applicable)• Documentation (if available)• Test/sample data (if available)• Scripts that would reproduce figures from a paper (if applicable)

– Answers to:• type, purpose, language, platforms• Pulldowns make entering these easy

– (Optional) keywords

Page 15: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Submitting Content

• “Come as you are” philosophy– Don’t want to discourage busy physicists

from submitting citable work because documentation is in poor shape

• Goal is that submitting a prepared package will take five minutes or less– Check boxes for type, purpose, language– Pulldown list for keywords

• Package will become publicly visible after moderator verifies it is suitable

Page 16: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Policies

• This is a code (and papers) repository– Packages contain source code and/or technical or

theoretical documentation– Build instructions and files should be included where

relevant– phystat.org does not distribute executables

• (Loose) Content Control– Must be relevant to some area of physics– Must be related to statistics, probability, fitting,

categorization, or similar area– The moderator(s) are not trying to be judges of quality

Page 17: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Policies

• License Issues– Submitters must agree to let our site

freely distribute the package (of course)– Submissions are allowed to attach

whatever license agreements they wish• As long as we can distribute the package

– The author – not the repository – is responsible for any enforcement of copyright and license issues.

• Repository “held harmless” against improper use by downloaders

Page 18: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Policies

• Steering Committee– 5-10 people active in statistics in physics– Probable initial configuration includes:

• Jim Linnemann (initial chair) (Atlas, D0)

• Louis Lyons (CDF)

• Harrison Prosper (D0, CMS, Cosmology)

• Glen Cowan (PDG statistics editor)

• Kyle Cranmer (Atlas)

• Roger Barlow (Babar)

– Meet primarily by e-mail– Set policies, directions of value-added work,

and so forth

Page 19: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Repository Support Activities(“Phase I of Phystat”)

• Establishment of web site– With mechanisms for browsing,

submission/updating, and discussion– With assignment of submission numbers

suitable for use a citations in papers• Licensing and filtering policies

– Must satisfy FNAL/DOE criteria• Community consensus on content policies

– And formation of steering committee• Dissemination of info about Phystat

Page 20: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Value-Added Activities(“Phase II”)

• These are all potential– Depending on community desires and time available– Some done by supporters/moderators– Others depend on participation by outside physicists

• Classification/validation related:– Distinguish actively maintained usable

packages from archival entries– Organizing user feedback synopsis– Lists of known working platforms for pkgs– Basic functional validation/certification– Organization of community comparisons

among packages

Page 21: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

Possible Value-Added Activities(“Phase II”)

• Extending Scope– Keep a “code wanted” list

• People express needs for specific capabilities

– Looking for and interfacing to relevant software produced by stats community

– Blobel: how about mathematical methods?• Improving Capabilities

– Integrating related packages– Soliciting/supporting/adding extensions to

submitted code– Portability enhancements

Page 22: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

You can make phystat.org Valuable

• Add to the Contents of phystat.org– Submit packages to be disseminated– Submit code fragments defining how your

analysis did statistics• You can reliably cite your submitted code by its

phystat number, much like a paper in arXiv. Prosper et. al, phystat.org/0603004 or

Prosper et. al, phystat.org/0603004v2– Submit documents explaining choices of

statistical approaches • phystat.org is pretty empty today

– But there is a large backlog of code and tools potentially valuable to the HEP community!

Page 23: The Phystat Repository For Physics Statistics Code M. Fischler, J. Linnemann, M. Paterno, P. Canal phystat.org Samsi, March 7, 2006 Duke University.

What Next

• Make use of phystat.org!– Browse for packages you may be able to use– Browse to see how various experiments tackled

your statistics issues– Use repository to download versions of major

packages

• Add value to packages– Validation and endorsement comments– Report problems and make suggestions

• Comment about repository mechanics