Data Management Standard Operating Procedures (SOPs)
Time and effort is spent in the planning and development stage of any monitoring program, careful thought has gone into what parameters should be collected and how should they be collected. Often times the planning stops after the monitoring design is complete and the data is never properly filed or stored. In order to make the best use of data collected, it is important to have a plan in place for how the data management and file organization will be conducted. With a solid plan, it will be easy to find data in the future.
Click on bold titles to expand the sections below
Data management starts before data is even collected. It is important to have a well-designed field sheet that represents only the parameters you are observing and collecting. It is easy to modify field sheets to be unique to your needs. Field sheet examples can be found on the Montana State University Extension Water Quality (MSUEWQ) Webpage. There is no right or wrong way to make a field sheet. The most important part is to make sure everything you are collecting is represented on the field sheet (even if it is not collected every visit). It is also important not to include information on the field sheet that will not be collected. This creates many blank spots on a field sheet and makes it more difficult to determine if everything is properly filled out on the field sheet.
It is very important to thoroughly fill out the field sheet while you are in the field. Before you leave a site, look over the field and make sure you didn't miss anything. After sampling is completed for the day, it is important to set up a procedure for getting the field sheets back to the monitoring coordinator. Volunteers can drop them off at the office when they are done, or scan and email a copy or send the originals through the mail. Once the field sheets are in the office it is important to thoroughly look through each one and check for missing information or data that looks erratic. Once the field sheets have been checked they are ready to be properly filed.
Note: If you are submitting water samples to be analyzed by a lab, make sure on the chain of custody to request an EDD (electronic data delivery) .csv file that is formatted for EQUIS.
There are many benefits to scanning and digitizing field sheets. A digital copy of a datasheet can be easier to find and share with others and will not degrade over time. Once field sheets are scanned, make a note in the margin of the date it was scanned with the initials of the scanner. In the monitoring SAP and SOP, it is important to be very specific on where electronic copies are filed and stored. The field sheets can be organized by date or site, whichever is preferable. Keeping electronic versions of all current hard copies is important, but it is also important to have electronic copies of all historic data sheets, reports, or documents. For organizations that have been operating for years or decades, this could be monumental task. To aid in the process of digitizing files, it might be worth investing in a high speed scanner.
Designate a specific location on a computer or share drive where all the scanned field sheets and data will be housed. The electronic version of the field sheet should always be stored on a share drive or hard drive that can be accessible from anyone at work. Never store files on a personal computer. It is important for all files to be backed-up on a regular basis too. This can be achieved by using an external hard drive and copying all the files once a week. This reduces any loss that may occur from a computer or server crashing.
After scanning field datasheets, it is important to file the hard copy as well. Organize file folders by sample sites or sampling dates and make sure to always file the hard copies after scanning.
1. "Show All Folders" view in Window Explorer - this view in Windows explorer allows you to see your whole folder hierarchy easily at one time on the left side of your screen. Click here for a video overview on how to turn it on and use it. In Windows 7, you find it in Windows Explorer on the left side under "Organize" -> "Folder and Search Options" -> "Navigation pane" and checking the box for "Show all folders." Click to view video as Windows Media Click to view video as MP4
2. Include the date and your initials at the end of a file name - keeping track of which file is the most current version is often very challenging. Including date and initials of the person editing at the end of the file name is very helpful for addressing this issue. Click to view video as Windows Media Click to view video as MP4
3. Create an Archive Folder for old versions of files - With a storage space becoming cheaper all the time, there is less motivation to delete old files in case you need them again later, but you don't want them clogging up your folders. Consider starting an "Archive" folder where you drag old version that you don't think you will need anymore but are not ready to delete. Click to view video as Windows Media Click to view video as MP4
4. Use a zero at the beginning of a file you want to find quickly in a folder - the default file order in a folder is alphabetical order with numbers coming before the letters. So, if you have an index or a summary file, you may want to name it with a zero in the front so it is always at the top of the folder. Click to view video as Windows Media Click to view video as MP4
5. Quick Reference Word Document in a folder to remember what you did - Save a word document inside a folder with a title like "0_FilesMoved" that describes where files are that you might expect in this location but are stored elsewhere. Click to view video as Windows Media Click to view video as MP4
6. Reorganize files with 2 file explorer windows - when doing a lot of file reorganization it is helpful to have two explorer windows open next to one another for easy dragging of folders and files from an old structure to a new structure. Click to view video as Windows Media Click to view video as MP4
A well organized folder structure will help you find files quickly and ultimately save you a lot of time. In general, a good folder structure means there is one and only one location where you would save a file and it is easy to navigate quickly to that location.
3.2.1 Laying out your folder structure
The best folder structure will look different for everyone but a key is to identify the most sensible ways to separate your information into groups for you and others in your organization. When you open your computer to look for a file, what characteristic of that file comes to mind first? ---the year it was done? --- the watershed or stream? --- the project name? --- the landowner name? --- the type of project? --- These are the types of things that should be part of your folder structure. The goal is to lay out a structure where every file has a place and ONLY ONE place which is intuitive. It is useful to go through this process with someone else in your organization so the structure you come up with makes sense to more people.
File names should be descriptive enough to tell you quickly what is in the file and ideally should tell you which file is the latest version. Consistency in file naming is more important than exactly how you decide to name them.
3.3.1 Naming files
The beginning of a file name is important for 2 reasons; 1) that is where your eye goes quickly when scanning; 2) this is what will determine the order of files when they are sorted by name. Considerations for a good file name have some relation to your folder structure. For example, if all of your files for a project are in a folder named for that project, it is less important that the project name is included in the file name.
3.3.2 Data and initials as part of the file name
Most files we work on are updated through time, possibly by multiple people; keeping track of the latest version is critical and can be challenging. Consider putting a date and possibly a time and initials at the end of your file names. If you put the date in the format of YEAR-MONTH-DAY, then the newest version will always be at the bottom of the list. If the last person to edit a file always puts their initials at the end of the file name, then you can always tell what person edited a file last and on what day. You cannot always rely on the date modified stamp that window's puts on a file; because this information is lost sometimes when a file is transferred.
Most people will enter the data in a format that looks like the example below. The column headings contain site name, date, parameters, etc and the rows are populated with the information from individual sampling events. Entering data in this manner will make the data easier to communicate findings to broader groups through tables, graphs and figures.
If you are required to upload data to EQUIS, consider using the EQUIS csv template as the starting point and populating it with collected field and lab data. The EQUIS format requires each row in Excel to contain information for one sample point. This is valuable for allowing you to include information about how the data was collected and analyzed, but this method makes it more difficult to produce graphs. It save time and frustration if uploading data to EQUIS is main priority.
Another option, besides Excel, is to keep data in an Access database. Setting up the database can be challenging, but once the spreadsheets are set up and formatted correctly, it can make data entry and analysis very easy. Keeping data in Access is recommended for advanced database users.
4.1.1 Meta Data
Meta data is information about how the data was collected. Long term storage of data or collecting data to be uploaded to an online database such as EQUIS or VOEIS will always require information about the meta data. The key to long term storage is that it must be stable and accessible over the long term independent of staff changes, loss of funding, etc. Meta data can be included in the excel sheet under a column heading called 'collection method'. The collection method should always follow what is outlined in the project SAP and SOP. If you change methods for a sampling event, make sure to note this on the field datasheets and in the meta data. Another option for keeping track of meta data is to create a code for all the different sampling methods used on different tab in the Excel worksheet and then use the code name on the excel tab that contains all the data and meta data.
• Methods of data collection, analytical methods used, who collected the data, capabilities of data collectors, quality control measures in place, reference to the Sample Analysis Plan.
4.1.2 Quality Control
Entering data off of field datasheets or from a lab can be a tedious process and it is easy to mis-key a number. Always take time to enter the numbers to ensure you are keying in the correct information. If there is a volunteer to help with the process, it can be beneficial to have one person read off the values from the field datasheet and have the other person enter the number. This way you have two people double checking what gets entered on the computer. Another simple quality control check is to randomly choose 10% of the field datasheets and check those values in the Excel sheet.
4.2.1 Local Storage
Local storage refers to storage on a computer, share drive or hard drive at the office. Local storage is where the raw data, project documents, and field datasheets will live. It's important to designate a specific location where all documents will be housed for everyone who works on the project so that all files will be stored appropriately. It is also important to have a backup of all project files that will live on a share drive (Dropbox, Google Drive, etc.) and a hard drive. If files are backed up on a share drive, you are not at a loss if a computer crashes or if you have employee turnover.
4.2.2 Public Database
There are options out there to allow you upload data to a public database that can be viewed by the public. In Montana, the Department of Environmental Quality (DEQ) has an online repository for storing water quality monitoring data, which includes physical, chemical, biological, and habitat data from a variety of projects across the state.. This database is called EQUIS, and some project data will be required to be uploaded to EQUIS if funds by DEQ money.
Another public database option is VOEIS (Visual Observatory and Ecological Informatics System) which is an open source database created by researchers and scientists to house environmental data. In order to upload data to VOEIS, you need to collaborate with MSUEWQ. VOEIS allows for groups to upload water quality and quantity data this viewable by the public and offers maps of the sampling locations and rudimentary graphs of the data. Instructions for uploading data can be found here.
4.2.3 SAP/SOP storage
It is important to update the project SAP and SOP each year to accurately represent
all pertinent information about the monitoring (site information, parameters collected,
volunteers, etc.). The changes should happen at the beginning of each year before
monitoring has started and to keep track of the changes from year to year, a track
changes table should be added the beginning of each document that overviews the changes
made and when. For example:
As mentioned previously, SAP and SOPs should be housed in a designated location on a share drive with all the other project files.
Once data is organized in a database or Excel, it is easy to use the data to make graphs, tables or summary statistics. Graphing data is great way to visualize the information. Two common and useful graphs for displaying environmental data are time series and site information graphs. Time series graphs plot data through time with time on the x-axis and a parameter (or two) on the y-axis and usually represent one sites. Site information graphs have sites on the x-axis and parameters on the y-axis and can represent and number of different sites which is useful to compare data between sites.
Another way to display data is through tables and summary statistics. Summary statistics are a great way to display important numbers. Summary statistics can include the min, max, mean, or the number of times a value is over a standard. No matter how you choose to analyze and interpret the data, it's important to get the information back out to the individuals or partners who helped you collect the data. This can be done simply and quickly by writing a quick email and attaching a few key figures or it could be more involved such as an annual report. The Madison Stream Team is a great example of annual report for water monitoring with volunteers: Link to report.