Menu Expand
Data Management for Researchers

Data Management for Researchers

Kristin Briney

(2015)

Abstract

A comprehensive guide to everything scientists need to know about data management, this book is essential for researchers who need to learn how to organize, document and take care of their own data.

Researchers in all disciplines are faced with the challenge of managing the growing amounts of digital data that are the foundation of their research. Kristin Briney offers practical advice and clearly explains policies and principles, in an accessible and in-depth text that will allow researchers to understand and achieve the goal of better research data management.

Data Management for Researchers includes sections on:

* The data problem – an introduction to the growing importance and challenges of using digital data in research. Covers both the inherent problems with managing digital information, as well as how the research landscape is changing to give more value to research datasets and code.

* The data lifecycle – a framework for data’s place within the research process and how data’s role is changing. Greater emphasis on data sharing and data reuse will not only change the way we conduct research but also how we manage research data.

* Planning for data management – covers the many aspects of data management and how to put them together in a data management plan. This section also includes sample data management plans.

* Documenting your data – an often overlooked part of the data management process, but one that is critical to good management; data without documentation are frequently unusable.

* Organizing your data – explains how to keep your data in order using organizational systems and file naming conventions. This section also covers using a database to organize and analyze content.

* Improving data analysis – covers managing information through the analysis process. This section starts by comparing the management of raw and analyzed data and then describes ways to make analysis easier, such as spreadsheet best practices. It also examines practices for research code, including version control systems.

* Managing secure and private data – many researchers are dealing with data that require extra security. This section outlines what data falls into this category and some of the policies that apply, before addressing the best practices for keeping data secure.

* Short-term storage – deals with the practical matters of storage and backup and covers the many options available. This section also goes through the best practices to insure that data are not lost.

* Preserving and archiving your data – digital data can have a long life if properly cared for. This section covers managing data in the long term including choosing good file formats and media, as well as determining who will manage the data after the end of the project.

* Sharing/publishing your data – addresses how to make data sharing across research groups easier, as well as how and why to publicly share data. This section covers intellectual property and licenses for datasets, before ending with the altmetrics that measure the impact of publicly shared data.

* Reusing data – as more data are shared, it becomes possible to use outside data in your research. This chapter discusses strategies for finding datasets and lays out how to cite data once you have found it.

This book is designed for active scientific researchers but it is useful for anyone who wants to get more from their data: academics, educators, professionals or anyone who teaches data management, sharing and preservation.

"An excellent practical treatise on the art and practice of data management, this book is essential to any researcher, regardless of subject or discipline." —Robert Buntrock, Chemical Information Bulletin


... recommended as a textbook for graduate-level research techniques courses. It's an important resource for academic and special library shelves and a vital reference for anyone working with data.


Kristen LaBonte

Briney has written a useful primer on data management for researchers which provides practical advice throughout on managing data. It is easy to read and clearly structured. http://www.ariadne.ac.uk/issue75/cole


Gareth Cole, Loughborough University Library

For researchers and consumers of data who are often fraught with managing excess information, Briney's book offers valuable techniques, strategies and standards to help achieve proficient data management and successful outcomes. This book can be useful to both novice researchers and well-established scientists alike. 


Mary F. Miles

Kristin Briney has a PhD in physical chemistry and a Master’s degree in library and information studies from the University of Wisconsin-Madison, and currently works in an academic library, advising researchers on data management planning. Her blog can be found at www.dataabinitio.com.


Apparently, NASA lost much of the early data from space exploration, including high quality video footage of the first moon landing. All the more reason to do as it says in the sub-title to the book.


Alan Crowden

Kristin Briney’s Data Management for Researchers is a book that should be on the shelf (physical or virtual) of every librarian, researcher and research administrator. Scientists, engineers, social scientists, humanists — anyone who’s work involves generating and keeping track of digital data. This is the book for you.

.... I recommend this book without hesitation for all academic libraries. Individual researchers, research administrators, funding agency employees and academic librarians would all find much useful information. Simply giving a copy to new graduate students is probably a worthwhile investment at any institution.

http://scienceblogs.com/confessions/2016/01/11/reading-diary-data-management-for-researchers-organize-maintain-and-share-your-data-for-research-success-by-kristin-briney/


John Dupuis, York University Library, Toronto

Table of Contents

Section Title Page Action Price
CONTENTS viii
ABOUT THE AUTHOR x
ACKNOWLEDGEMENTS xi
1 THE DATA PROBLEM 1
1.1 WHY IS EVERYONE TALKING ABOUT DATA MANAGEMENT? 2
1.2 WHAT IS DATA MANAGEMENT? 3
1.2.1 Defining data 3
1.2.2 Defining data management 7
1.3 WHY SHOULD YOU DO DATA MANAGEMENT? 7
2 THE DATA LIFECYCLE 9
2.1 THE DATA LIFECYCLE 9
2.1.1 The old data lifecycle 9
2.1.2 The new data lifecycle 11
2.2 THE DATA ROADMAP 11
2.2.1 Following the data roadmap 11
2.3 WHERE TO START WITH DATA MANAGEMENT 13
2.4 CHAPTER SUMMARY 15
3 PLANNING FOR DATA MANAGEMENT 15
3.1 HOW TO PLAN FOR DATA MANAGEMENT 15
3.1.1 The importance of planning for data management 17
3.1.2 How to customize data management to your needs 17
3.2 CREATING A DATA MANAGEMENT PLAN 19
3.2.1 Why create a written data management plan? 19
3.2.2 What a data management plan covers 19
3.2.3 Creating a data management plan for your research 23
3.3 DATA POLICIES 23
3.3.1 Types of policies and where to find them 23
3.3.2 Data privacy policies 25
3.3.3 Data retention policies 25
3.3.4 Data ownership policies 27
3.3.5 Data and copyright 29
3.3.6 Data management policies 29
3.3.7 Data sharing policies 29
3.4 CASE STUDIES 31
3.4.1 Example data management plan for a Midwest ornithology project 31
3.4.2 My data management plan for this book 31
3.5 CHAPTER SUMMARY 33
4 DOCUMENTATION 35
4.1 RESEARCH NOTES AND LAB NOTEBOOKS 35
4.1.1 Taking better notes 35
4.1.2 Laboratory notebooks 37
4.1.3 Electronic laboratory notebooks 41
4.2 METHODS 43
4.2.1 Definition of methods 43
4.2.2 Evolving protocols 45
4.2.3 Managing methods information 45
4.3 OTHER USEFUL DOCUMENTATION FORMATS 45
4.3.1 README.txt files 45
4.3.2 Templates 47
4.3.3 Data dictionaries 47
4.3.4 Codebooks 49
4.4 METADATA 49
4.4.1 When to use metadata versus notes 51
4.4.2 The basics of metadata 51
4.4.3 Adopting a metadata schema 55
4.5 STANDARDS 57
4.5.1 General standards 57
4.5.2 Scientific standards 57
4.6 CHAPTER SUMMARY 61
5 ORGANIZATION 61
5.1 FILE ORGANIZATION 61
5.1.1 Organizing digital information 63
5.1.2 Organizing physical content 63
5.1.3 Organizing related physical and digital information 65
5.1.4 Indexes 65
5.1.5 Organizing information for collaborations 67
5.1.6 Organizing literature 67
5.2 NAMING CONVENTIONS 69
5.2.1 File naming 69
5.2.2 File versioning 71
5.3 DOCUMENTING YOUR CONVENTIONS 73
5.3.1 What to document 73
5.3.2 Where to document 73
5.4 DATABASES 75
5.4.1 How databases work 75
5.4.2 Querying a database 77
5.5 CHAPTER SUMMARY 79
6 IMPROVING DATA ANALYSIS 79
6.1 RAW VERSUS ANALYZED DATA 79
6.1.1 Managing raw and analyzed data 81
6.1.2 Documenting the analysis process 81
6.2 PREPARING DATA FOR ANALYSIS 81
6.2.1 Data quality control 81
6.2.2 Spreadsheet best practices 85
6.3 MANAGING YOUR RESEARCH CODE 87
6.3.1 Coding best practices 87
6.3.2 Version control 89
6.3.3 Code sharing 91
6.4 CHAPTER SUMMARY 93
7 MANAGING SENSITIVE DATA 93
7.1 TYPES OF SENSITIVE DATA 93
7.1.1 National data privacy laws 95
7.1.2 Ethics and sensitive data 97
7.1.3 Other data categorized as sensitive 97
7.2 KEEPING DATA SECURE 97
7.2.1 Basic computer security 99
7.2.2 Access 101
7.2.3 Encryption 103
7.2.4 Destroying data 105
7.2.5 Personnel 105
7.2.6 Training and keeping a security plan 107
7.2.7 Summarization of the dos and don’ts 107
7.3 ANONYMIZING DATA 107
7.3.1 Types of personally identifiable information 109
7.3.2 Masking data 109
7.3.3 De-identifying data 111
7.3.4 Other anonymization considerations 114
7.4 CHAPTER SUMMARY 115
8 STORAGE AND BACKUPS 115
8.1 STORAGE 115
8.1.1 Storage best practices 117
8.1.2 Storage hardware 117
8.1.3 Choosing storage 119
8.1.4 Physical storage 121
8.2 BACKUPS 121
8.2.1 Backup best practices 121
8.2.2 Backup considerations 123
8.2.3 Test your backups 123
8.2.4 Backing up analog data 123
8.3 CASE STUDIES 125
8.4 CHAPTER SUMMARY 125
9 LONG-TERM STORAGE AND PRESERVATION 127
9.1 WHAT TO RETAIN AND HOW LONG TO RETAIN IT 127
9.1.1 Data retention policies 127
9.1.2 Common sense data retention 131
9.2 PREPARING YOUR DATA FOR THE LONG TERM 131
9.2.1 Keeping fi les readable 131
9.2.2 Keeping datasets interpretable 135
9.2.3 Long-term data management 137
9.3 OUTSOURCING DATA PRESERVATION 137
9.4 CHAPTER SUMMARY 139
10 SHARING DATA 139
10.1 DATA AND INTELLECTUAL PROPERTY 139
10.1.1 Data and copyright 141
10.1.2 Licenses and contracts 143
10.1.3 Patents 143
10.1.4 Intellectual property and data sharing 145
10.2 LOCAL DATA SHARING AND REUSE 145
10.3 COLLABORATIONS 145
10.4 PUBLIC DATA SHARING 147
10.4.1 Reasons for public sharing 147
10.4.2 Sources of public sharing requirements 149
10.4.3 When and what to share 151
10.4.4 Preparing your data for sharing 153
10.4.5 How to share 153
10.4.6 Licensing shared data 157
10.5 GETTING CREDIT FOR SHARED DATA 159
10.5.1 The basics of getting credit for your data 159
10.5.2 Altmetrics 161
10.6 CHAPTER SUMMARY 161
11 DATA REUSE AND RESTARTING THE DATA LIFECYCLE 163
11.1 FINDING AND REUSING DATA 163
11.1.1 Finding data 163
11.1.2 Data reuse rights 165
11.1.3 Using someone else’s data 165
11.2 CITING DATA 167
11.2.1 Citation format 167
11.2.2 Other citation considerations 167
11.3 RESTARTING THE DATA LIFECYCLE 169
REFERENCES 171
INDEX 185