# Correlation Using the R Statistical Package - Part 2: Data Preparation

Learning Object

2014

## Citation

Plant and Soil Sciences eLibrary (PASSeL) Lesson

This project was supported in part by the National Research Initiative Competitive Grants CAP project 2011-68002-30029 from the USDA National Institute of Food and Agriculture, administered by the University of California-Davis and by the National Science Foundation (NSF), Division of Undergraduate Education, National SMETE Digital Library Program, Award #0938034, administered by the University of Nebraska. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the USDA or NSF.

This eLesson was supported in part by the National Research Initiative Competitive Grants CAP project 2011-68002-30029 from the USDA National Institute of Food and Agriculture, administered by the University of California-Davis. Any opinions, findings, conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the USDA -NIFA.

## Abstract

In the first module we looked at the field experiment as described by the domain scientist and we also looked at the data from the perspective of the analyst. In this module we will learn how to install the R software then we will explore a few basics of data files that are generated by sensors and data loggers. We will also take a quick tour of tools that may be used to look at the raw data. We will then develop a methodology that may be taken with respect to preparing raw data for a computation that helps us analyze. Finally, we will look a few "data types" that are typically used in R.

Objectives

When conducting research, sometimes it is important to know if two different characteristics are related to each other. An example in plant breeding might be if deep roots are somehow related to drought resistance. Correlation is a measure of dependence or statistical relationship between two random variables or two sets of data. Correlation measures the strength of the linear relationship between two variables, such as the deeper roots are, the more the plant can withstand drought. Not all climates will indicate a good correlation between these.

"R" is a free-to-use software programming language and software environment for statistical computing and graphics. R is widely used in both academia and industry for data analysis and modeling. In this lesson module we will introduce you to a real world experiment in which one objective is to determine which Canopy Spectral Reflectance (CSR) indices and growth stages best correlate with yield and/or yield components under water stressed and non-stressed treatments in 300 winter wheat lines.

This lesson does not require you to have any prior knowledge of the R language and environment, nor do you need to be fluent in plant breeding research. However users who have basic understanding of a statistical technique called "correlation" will be able to take better advantage of the lesson. If you need a refresher on correlation, watch Statistics 101: Understanding Correlation by Brandon Foltz on YouTube.

After completing this lesson, you will be able to:

1. Describe the real world plant breeding experiment that utilizes the Canopy Spectral Reflectance (CSR) tool.

2. List the objectives of the CSR experiment.

3. Explain how the data is being collected for the experiment, including how the data looks when it is received from the sensors and the data loggers.

4. List some of the measures (indices) that are of interest to the scientist and relate those to the data that is collected in the field.

Modules:

COinS