Correlation Using the R Statistical Package - Part 2: Data Preparation
Date of this Version
Plant and Soil Sciences eLibrary (PASSeL) Lesson
In the first module we looked at the field experiment as described by the domain scientist and we also looked at the data from the perspective of the analyst. In this module we will learn how to install the R software then we will explore a few basics of data files that are generated by sensors and data loggers. We will also take a quick tour of tools that may be used to look at the raw data. We will then develop a methodology that may be taken with respect to preparing raw data for a computation that helps us analyze. Finally, we will look a few "data types" that are typically used in R.
When conducting research, sometimes it is important to know if two different characteristics are related to each other. An example in plant breeding might be if deep roots are somehow related to drought resistance. Correlation is a measure of dependence or statistical relationship between two random variables or two sets of data. Correlation measures the strength of the linear relationship between two variables, such as the deeper roots are, the more the plant can withstand drought. Not all climates will indicate a good correlation between these.
"R" is a free-to-use software programming language and software environment for statistical computing and graphics. R is widely used in both academia and industry for data analysis and modeling. In this lesson module we will introduce you to a real world experiment in which one objective is to determine which Canopy Spectral Reflectance (CSR) indices and growth stages best correlate with yield and/or yield components under water stressed and non-stressed treatments in 300 winter wheat lines.
This lesson does not require you to have any prior knowledge of the R language and environment, nor do you need to be fluent in plant breeding research. However users who have basic understanding of a statistical technique called "correlation" will be able to take better advantage of the lesson. If you need a refresher on correlation, watch Statistics 101: Understanding Correlation by Brandon Foltz on YouTube.
After completing this lesson, you will be able to:
1. Describe the real world plant breeding experiment that utilizes the Canopy Spectral Reflectance (CSR) tool.
2. List the objectives of the CSR experiment.
3. Explain how the data is being collected for the experiment, including how the data looks when it is received from the sensors and the data loggers.
4. List some of the measures (indices) that are of interest to the scientist and relate those to the data that is collected in the field.