UiTs Principles and guidelines for the management of research data at UiT requires all employees and students to document their data in accordance with best practices and for future reuse:
Research data must be documented with metadata, method descriptions, and permanent identifiers that allow other researchers to find and use the data. Metadata ensure compliance to international standards / de facto standards when applicable and describe the data content with a focus for future use.
All work using research data must be meticulously documented, with generous amounts of metadata and a descriptive ReadMe file. It is good practice to begin the documentation early and continue to add information throughout the project. Documentation procedures should be defined during the planning phase. If you put off structure and documentation, there is a risk that critical information will be lost or incorrectly recorded. If you plan your job wisely, you can save a lot of time and avoid unneeded duplication.
Metadata is information about your data that is organized and standardized. Metadata is receiving more attention and demands since it is essential to make research data FAIR. Machine-readable metadata forms allow for indexing and searching, as well as providing contextual information critical for understanding and reusing data across technological platforms, institutions, and borders. The degree of FAIR is determined by the quality and scope of the metadata. As a result, it is important that the data be documented using properly filled metadata forms.
Many different standards for metadata documentation have been developed, both generic and subject-specific. Follow the scientific conventions set for your field, and use standardized terms, taxonomies/ontologies, and vocabulary whenever possible. This increases the reusability of your data.
Many data archives, organizations, and journals set metadata requirements. Check this early on so you know what metadata to collect for your project.
Addionally, many different standards for metadata documentation exist, both generic and subject specific. It is recommended that you follow the scientific customs for your field of researchs, and where possible use its standardized terminology, taxonomy/ontology or vocabulary. Examples of generic metadata-stadards are Dublin Core, Darwin Core (biology), and the Data documentation initiative. Research Data Alliance, FAIRSharing.org and the Digital Curation Centre all provide overviews of various standards.
Tools for simplifying documentation have been created for some metadata standards. However, in most cases, it will be more practical to collect the information in a ReadMe file that is saved alongside the data (see below). This will also be a good alternative if no metadata standard exists for your field of research.
ReadMe files are plain text files used to describe software packages. When working with data, a ReadMe file that follows the dataset and serves as a guide for understanding the data might be useful. The ReadMe file should ensure that the data is understandable by you or others when the dataset is shared and published.
It is recommended that you create the ReadMe file early on and place it in the dataset's main directory. Every time you operate on the data, the file can be updated here.
The ReadMe file should explain how the dataset was created, how complete it is, and under what conditions it can be reused. Much of the information in a ReadMe file will overlap with generic metadata information, but the ReadMe file must additionally include a detailed method description, an overview of the files, and an explanation of the files' contents. Make your descriptions as specific and as clear as possible. Define phrases and acronyms, and use well-known technical terms. This is necessary in order to make the dataset FAIR and reusable. The text in the ReadMe file can be reused in article publishing, which is an added benefit of keeping a good method description.
A ReadMe file must have at least the following:
- General background information (title, DOI, contact info, date, place, ownership, financier).
- Descriptions of methods (protocols, instruments, software).
- File overview.
- File-specific information, including a description of variables and units.
- Reference and conditions for reuse.
Templates and examples of ReadMe files can be found in the user guide for DataverseNO.
Examples of other relevant documentation that should accompany the dataset:
- Descriptions, instructions, and protocols for the phases of collection, processing, and analysis.
- Configuration files and log files from calibration, processing, and analysis.
- Dictionaries and code form.
- Variable lists.
- Information letter and consent form.
- Form for notifying NSD and ethical approvals.
- Questionnaire and interview guide.
- Permits and licenses from rights holders, if any.
File and folder organization and naming
It is important that you and your colleagues agree on how the research data should be organized early on and that this is followed by all parties involved. Make a plan for how the data will be organized in files and folders, as well as how they will be titled. It will be essential to have clear and simple file and folder names.
Tips for organizing your files:
- A hierarchical folder structure can help you keep track of and structure your data.
- Organize the folders into relevant categories.
- All folders should have a consistent naming structure. Make the folder name match the contents of the folders.
- Make the file names reflect the folder structure. This will make it easier to keep track of the data when you archive it later.
Use wording that is meaningful in the project. It should be possible to understand the contents of a file without having to open it.
Some general guidelines for naming files and folders:
- Use consistent file names.
- Use descriptive but short file names (
- Avoid spaces. Instead, you can use underscores (e.g. first_study), hyphens (e.g. first-study), or camel style (FirstStudy).
- Avoid special characters like \ /? : * ”> <| : #% ”{} | ^ [] `~ æÆ øØ åÅ äÄ öÖ.
- Use international date format: YYYY-MM-DD (e.g. 2021-06-01).
- Use more digits if the files are numbered (e.g. 001 instead of 1). Then you avoid clutter when sorting.
Some elements that can be included in filenames are for example:
- Date/time interval/location.
- Name of study/project.
- Version number.
- File content.
- Name/initials of the researcher.
Avoid:
- Non-descriptive, generic folder names such as "Current".
- Personal names of folders within a project, folder names should reflect the content.
- Overlapping categories or multiple similar folders located in different locations.
- Multiple copies of the same file in different folders. If necessary, you can create shortcuts to a file.
File and folder names often control how the files are sorted. Thus, the desired sorting can be decisive for the choice of name syntax.
Remember to provide organizational and name syntax documentation in a ReadMe file (see above) at the top level of the folder hierarchy.
A webinar on the topic of organizing and documenting research data is held every semester for those interested in learning more. A PowerPoint presentation with more information is also available on the course page.
If you need advice and guidance with metadata and documentation, please contact the research support team at researchdata@hjelp.uit.no.
Updated: 14.12.2023, updated by: Noortje Haugstvedt