Program | Librarians Building Momentum for Reproducibility

THIS EVENT IS PAST!

You can view a full playlist of the talks here: https://www.youtube.com/playlist?list=PLRgd5xHgcpG_aLrVXy3Wu7HxbobGX0DPo or view the links on the program to see videos for individual talks. All of the slides are housed in our OSF for Meetings instance here: https://osf.io/meetings/LBM4R.

You can view the conversation that happened on Twitter: #ReproLibs. You can also view and use the materials in the collaborative note-taking document that over 99+ people took notes in: https://osf.io/f7vzd/.

Time (EST) Title Presenter(s)
12:00 pm Welcome & Introductions [slides] [video] Organizing Committee
12:10 pm Keynote: Librarians & Reproducibility — It's Time! [slides] [video] Melissa Rethlefsen
12:40 pm BREAK
12:50 pm Full Talks
Engaging undergraduates on issues of reproducibility [slides] [video] []
As librarians and libraries become more involved with all of the phases in the research process, we are in a unique position to introduce emerging researchers to the complexities of the reproducibility crisis. We should seek to introduce undergraduate students to the causes of the crisis, issues surrounding it, and potential ways to affect change. No one person or institution is solely responsible, and the necessary changes will only come via collective action and activism; thus the sooner we educate our students about the nuances of the crisis, the better. This presentation will explore ways that librarians can introduce undergraduates to the reproducibility crisis via instruction. The relationship between reproducibility concepts and the ACRL Framework for Information Literacy will be investigated. Examples within library instruction, both subject-specific and general, will be reviewed, with particular focus on undergraduate research programs and events, as well as scaffolded library instruction within departments' curricula. All examples and ideas will be specifically focused on the undergraduate student population, though many could also be used with graduate students or others in academia. Participants will gain concrete ideas, practical suggestions, and helpful resources.
Nicole Helregel
Identifying Barriers to Reproducibility: An Ethnographic Study of a Hands-on Workshop [slides] [video] []
In recent years, alarms have been raised over instances of irreproducibility in science, including biomedical research. Overcoming this issue requires understanding the practical challenges and barriers that contribute to reproducibility problems. This talk will describe some of the factors contributing to lack of reproducibility in the biomedical literature based on the experiences of researchers participating in a hands-on reproducibility workshop hosted by the National Library of Medicine. During the three-day workshop, teams attempted to reproduce the results of published studies that made their data and code available in publicly accessible repositories. The workshop also included brief lectures on tools for reproducibility, like Docker containers and Git, that participants could use to make their own research more reproducible. Of ten teams at two workshops, none of the teams could fully reproduce the studies. This study takes a grounded theory ethnographic approach to understand processes employed by researchers in attempting to reproduce research results, thereby elucidating characteristics of the publications and their related data and software code that lead to breakdowns in these processes that lead to irreproducibility of research results. This talk will also discuss how the issues identified as contributing to reproducibility could be addressed by librarians working with research teams.
Lisa Federer, Maryam Zaringhalam
Weaving Reproducibility Education into a PhD Life Sciences Curriculum [slides] [video] []
Background: Since 2016, our academic health sciences library has had a teaching role in our institution's PhD program in biomedical sciences, offering a required 1-credit research skills class. Following changes to the NIH's policy on rigor and reproducibility training, we have changed our curriculum from a general research skills class to a focused curriculum on Rigor and Reproducibility. The NIH's rigor and reproducibility policy requires training in topics like experimental replicability, scientific premise, authentication of biological resources, sex as a biological variable, and computational reproducibility, so we have built class materials around these concepts. Through our extended relationship with the department, the library has received additional opportunities to weave computational reproducibility into the PhD curriculum. Methods: We used expanded on some educational topics of our old class on research skills, removed certain topics, and reframed others around rigor and reproducibility. For example, experimental replicability was expanded into two sessions from a one-class topic, and literature searching was reframed around rigorous scientific premise. We focused new lessons on topics in experimental replication, rigor of a research question's scientific premise, computational reproducibility, and critical appraisal of the replicability of published literature, as well as providing a guest lecture on authentication of key resources. We administered both formative and summative knowledge assessments, as well as an evaluation of student attitudes. Results: Our experiences have highlighted that on average, students perceived the new, unified curriculum as consistently more useful, and generally had more positive things to say about the class as compared with students in past semesters. In particular, sessions on replicable research, research data management and computational reproducibility were all rated as more useful than similar topics in past years. Moreover, the experience for instructors felt more holistic and purposive towards improving student outcomes when teaching a unified class rather than general skills.
Fred LaPolla, Alisa Surkis
'Rigor and Reproducibility in Research' data support instruction at UC Berkeley [slides] [video] []
Responsible Conduct of Research (RCR) instruction is increasingly required by funders as a means of addressing reproducibility, transparency and ethics in the research process. This presentation will discuss an innovative collaboration among science and data librarians, together with Research IT staff, to develop and deliver a "Support Your Data" module for a "Rigor and Reproducibility in Research" course. This course is part of an RCR curriculum at UC Berkeley that is required for incoming Molecular & Cell Biology, Chemical Biology and Bioengineering graduate students, but which is also relevant to researchers from a wide range of disciplines and funding sources. Providing RCR data support instruction provides opportunities for liaison librarians to go beyond traditional services and align our roles with individual and institutional research priorities. Our collaboration, combining the strengths of subject and functional specialists, crosses multiple disciplines (currently including the sciences and social sciences but with potential for more), and may lead to future opportunities for engagement throughout the research lifecycle in support of reproducibility. We have found that contributions to RCR courses can be implemented readily by librarians together with campus partners, can enable instructors to share best practices and ensure the consistency of messages, and can multiply potential points of contact with the students, research staff and faculty we serve. The "Support Your Data" module was successfully delivered in Fall 2018 and Fall 2019, and there are plans for the continuation and expansion of the program.
Elliott Smith, Anna Sackmann, Ann Glusker
What's sauce for the goose is sauce for the gander: Reproducible practice in library work [slides] [video] []
Librarians are increasingly concerned with promoting and supporting research reproducibility on their campuses. However, if librarians want to encourage researchers to use reproducible practices, we should also be using reproducible practices in our own work. For librarians who work with library collections data, there is an obvious opportunity to practice reproducible behaviors. Making collections data analysis more repeatable improves librarians' work by reducing time and increasing consistency. It also gives them more empathy for what researchers experience as they attempt to move their current research behaviors in a more reproducible direction. In this lightning talk, we will describe and share templates we have created for documenting our own collections analysis projects. The templates were made with open source tools including Markdown and R, which are often recommended to researchers.
Heidi Tebbe, Danica Lewis
A Primer on Preregistration (& why I think it should be a submission track in LIS journals) [slides] [video] []
Bishop (2019) refers to the following as the four horsemen of the reproducibility apocalypse: Hypothesizing after results are known (HARKing), p-hacking, underpowered studies, and publication bias. A potential way to thwart these threats to science is by preregistering study methods and analyses plans via preregistration venues (e.g., OSF Registries, Prospero). Another option that more and more journals are adopting is a registered reports submission track. In this scenario, methods are peer reviewed instead of results. Journals critique methods as opposed to results and grant conditional acceptances. This prevents journals from refusing to publish null or negative findings. This session will explain what this means for academic librarianship work and LIS scholarship.
Amy Riegelman
2:20 pm Questions for morning sessions [video]
2:30 pm BREAK
2:40 pm Breakout Activity: description Organizing Committee
3:25 pm BREAK
3:35 pm Full Talks
Saving Data Journalism: Using ReproZip-Web to Capture Dynamic Websites for Future Reuse [slides] [video] []
Data journalism stories are a way for newsrooms to tell innovative and complex stories through data, and to visualize that data in a way that allows readers to interact with and explore the numbers behind the story. Iconic examples of these works are being produced by legacy newsrooms like The New York Times (via The Upshot), digital-first organizations like ProPublica, and hundreds of other organizations around the world. Yet, current web archiving cannot capture the source files that underlie these apps, which makes it hard to archive and preserve such applications. As such, they are being lost. We present ReproZip-Web, an open-source prototype aimed at saving these news applications from extinction. ReproZip-Web leverages ReproZip, a computational reproducibility tool, and Webrecorder, a tool for recording Web resources, to automatically and transparently capture and replay dynamic Websites. The prototype creates a bundle that contains all the information needed to reproduce a news application, and its lightweight nature makes it ideal for distribution and preservation. This talk will present ongoing work on the prototype, and also discuss some use cases and avenues for future development.
Katy Boss
Container images for research librarians 101 [slides] [video] []
Containerisation is a hot topic for increasing reproducibility of research. A container can capture a complex computing environment in the exact state that was just to run a scientific workflow. Based on containers, infrastructures are build for researchers to improve communication, collaboration, and reproducibility (cf. https://www.nature.com/articles/d41586-019-03366-x). With the increasing use of containers they are likely to become a topic for research librarians, albeit their background of scalable cloud information technology being distincly incompatible with traditional services and perspectives of libraries. Nevertheless, libraries could provide guidelines for container usage and build infrastructures to leverage the advantages of a containerised workflow, but they might also have to handle container-related artefacts as products of research projects. In this talk, I give an introduction into the basics of containerisation and examine the innards of container image formats (Docker/OCI image, Singularity image) and how these images are created (Dockerfile, build scripts, duality of image and Dockerfile). Further I'll give an overview of the state of the art in container preservation (other's work, e.g. https://doi.org/10.2218/ijdc.v12i2.509, Rechert et al. in https://doi.org/10.11588/heibooks.285.377). Based on my perspective as a research software engineer, I will speculate on the potential and the challenges for container archival and preservation, and how in ten years a container image might help the inspection of research published today.
Daniel Nüst
R is for Raising Awareness: Strategizing mindful changes using popular tools [slides] [video] []
As peers in the research data science world, we are well versed in the need for reproducibility, we have knowledge of international tools and efforts, and we may often work in worlds where "reproducibility" is built into the vernacular, and those we serve have come to us with a vested interest in furthering their own efforts in reproducibility. However, as we move into new roles, or serve new populations, we may find that our biggest obstacle is the most simple: Raising awareness. At a large, diverse, public R1 institution where focus is fixed on raising the bar in grants, publications, and research productivity, attention and energy to share and preserve research are often hard to garner. In this presentation, we will start a conversation about raising awareness and fostering knowledge indirectly, by meeting students, faculty, and colleagues, around other relevant needs. In this case, we will share ideas for using initiatives around popular tools such as R, in multiple ways, to facilitate these dialogs. Many wonderful community initiatives around R training for open science currently exist, and may lend themselves well to programmatic use. Like sneaking vegetables into your family's dinner, the concept of "good for you" can start to shift into something enjoyable if it comes in those palatable forms.
Reid Boehm, Taylor Davis-Van Atta
Ethical and Practical Considerations for Sharing Protected and At-Risk Data [slides] [video] []
Data sharing is becoming an increasingly prevalent and expected part of the research process. Researchers may be hesitant to share datasets about human subjects and some plant and animal species that are considered protected data. Additionally, there may be companies, sites, or other entities that would be put at risk if certain research data about them were shared publicly. This presentation will address conflicts that can arise when attempting to balance the protection of data with expectations for open data, such as restrictive language in data use agreements, IRB protocols, and consent forms. Audience participants will learn curation skills, language to use to ease nervous researchers, and institutional-level efforts to ensure safe sharing of protected data.
Shanda Hunt, Shannon Farrell, Katie Wilson, Caitlin Bakker, Alicia Hofelich Mohr
Reproducibility in the Collection of Historical Primary Sources [slides] [video] []
In a 2016 essay for The American Historical Review, Lara Putnam wrote that "Source digitization has transformed historians' practice…. For the first time, historians can find without knowing where to look. As a result, at an unprecedented rate we are finding connections in unexpected places: powering publication on mobile ideas and international audiences; circuits, networks, and border-crossing flows. Technology has exploded the scope and speed of discovery." As primary sources become more accessible for historians and we become able to consult more material from more archives than our predecessors, the process of data collection in historical research risks becoming even more opaque than it was for our predecessors. Historians—members of a discipline with allegiances to both the humanities and social sciences—have largely ignored calls for reproducibility. However, I contend that at least facets of reproducible research practices are increasingly necessary for historians in order to address the unprecedented access to and volume of sources that we encounter in what Roy Rosenzweig has termed our "age of abundance." In this presentation, I will outline the typical practices of primary-source data collection in the discipline of history. I will then explore which practices from the reproducible research community hold promise for the field, with a special focus on systematizing how historians consult archival materials, reporting "negative" results, and recording the uneven terrain of archival systems as archives and special collections slowly but surely digitize their materials. I invite the community to help me consider what other promising practices for reproducibility in research can and should guide the future of studying the past.
Benjamin Wiggins
Investigating Emulation as a Service for Reproducible Research at Yale [slides] [video] []
Over the past year, representatives from the EaaSI (Emulation-as-a-Service Infrastructure) program hosted at Yale University and Yale's Institution for Social and Policy Studies have been working together to investigate EaaSI's tools as a curation and access service for research data deposited in the ISPS Data Archive. Emulation promises a novel approach for reproducing computational research results tied to legacy and proprietary software; by supporting legacy research software, packaging systems, and the operating systems and runtimes they depend on, emulation can address software "rot" and provide access without losing dependencies or rewriting code. Using Yale's existing collections of "obsolete" software, three potential paths have so far emerged within EaaSI's program of work: manually recreating the software environment of deposited data sets using EaaSI's core interface for managing and delivering emulations via a web browser; using EaaSI's in-development Universal Virtual Interactor (UVI) to automatically recommend and render data in already-existing emulated environments; and packaging data using ReproZip and directly importing ReproZip packages into an appropriate, contemporary emulated computing environment. Ongoing discussion has also taken place on how to mesh Emulation-as-a-Service with ISPS' existing open-source web app YARD (Yale Application for Research Data). How can researchers and data curators assist with software preservation and access via emulation at the point of deposit or review? This presentation will summarize Yale's findings on these efforts, incorporating quick demonstrations with examples from the ISPS Data Archive, to illustrate the advantages of Emulation-as-a-Service for research data management.
Ethan Gates, Limor Peer
5:05 pm Questions for afternoon session [video]
5:15 pm BREAK
5:20 pm Lightning Talks
Cultural barriers to reproducibility in the lab [slides] [video] []
Given the growing attention to reproducibility in the biomedical sciences there is an increasing need for reproducibility training for biomedical researchers. At the University of California, San Francisco the Library has taught a series of hands-on workshops on practices like sharing data, methods, and code, publishing in open access journals or pre-print archives, and designing rigorous experiments. While practical training is sorely needed, we noticed throughout our workshops that there were often cultural barriers that made it hard for students and trainees to implement more reproducible practices in a lab setting. Learners would say things like "I would love to share my data but my PI would never go for it" or "my collaborators don't see the value of rigor, whatever generates a paper is fine." This talk will summarize the cultural barriers that learners shared, and help librarians brainstorm ways to promote reproducibility in the hierarchical, results-oriented culture of science.
Ariel Deardorff
Reproducibility of computer searches in systematic reviews: checklist items used to assess computer search reports [slides] [video] []
For optimum confidence in the results of systematic reviews (SR), readers need to be confident in the reproducibility of the computer searches (CS) used for SR. Readers depend on what is reported in SR to assess methods used for CS, and checklists can be used for such assessments. A Computer Search Report Checklist (CSRC) was developed and used to evaluate computer searches, and in 2015 CSRC, AMSTAR and PRISMA checklist items were compared to note how each supported the assessment of computer search reports for reproducibility or comprehensiveness. Results from that comparison were presented at APA's annual convention in Toronto (libguides.library.kent.edu/apacsrc). Since 2015 considerable work has focused on development of items, and this lightning talk presents results from a comparison update. A content analysis looks at the following resources: 1. AMSTAR-2 (amstar.ca) 2. A "comprehensive" item bank recently developed for evaluating SR (doi.org/10.1016/j.jclinepi.2019.11.010) 3. PRISMA-S (doi.org/10.17605/OSF.IO/YGN9W) 4. QAESR (a CSRC subset - libguides.library.kent.edu/csrcproject/reproducibility) 5. SRS-checklist(vbn.aau.dk/en/publications/an-instrument-for-evaluating-searches-for-systematic-reviews-the-). A goal is to look at how these checklists/items support recording and assessment of computer search report information related to the reproducibility of CS. Results can form a basis for conversations that focus on helping readers of SR evaluate computer search reports and make decisions on the reproducibility of computer searches used for SR.
Paul Fehrmann
What traditional scholarly communications can't tell: Hidden research data analysis processes and how librarians can help researchers facilitate process sharing [slides] [video] []
In traditional scholarly communications, only peer-reviewed articles as a final version of research are shared with readers, but research is an iterative process to arrive at a conclusion. In order to make research more open and transparent, there have been open scholarship (or open science) initiatives such as pre-registration, open access, open peer review, open data, pre/post prints, authors' rights, just to name a few. In this lightning talk, I would like to highlight the importance of sharing research data analysis processes which include not only the final statistical method applied to the research, but also different approaches considered yet disregarded in the final publication based on my research collaboration with a professor (https://doi.org/10.1108/OIR-01-2019-0014). In addition, I would like to share what open source tools (R, RStudio, R packages) were used to make the research reproducible or replicable. Lastly, the lightning talk addresses how librarians can help researchers participate in the open scholarship initiatives in regards to direct collaboration, workshops, and services.
Yoo Young Lee
Reproducibility & Qualitative Data Analysis [slides] [video] []
Vicky Steeves
Momentum via ReproducibiliTea Journal Clubs [slides] [video] []
ReproducibiliTea was first launched as a reproducibility-themed journal club at Oxford, but now it's international with many different chapters. The Minnesota Chapter was launched in September 2019 with support from a librarian. This lightning session will summarize the impetus, successes, and lessons learned from the Minnesota Chapter of ReproducibiliTea.
Amy Riegelman
5:50 pm End discussion & wrap up [slides] [video] Organizing Committee
6:00 pm DONE! Thanks for coming!