Starting Data Analysis for Staffanson 2015

With all of my data collected and visualized for the heads from Staffanson Prairie Preserve from 2015, it is now time to start with some initial data analysis. As I mentioned in my last post, I will be using R for all of my analyses, taking advantage of skills I learned in Stuart’s class at Northwestern this quarter. I will be creating statistical models based on the data to look for relationships between variables that may influence mate availability and reproductive success. The variables I am specifically looking at are:

  • Distance to the kth nearest flowering neighbor. This is a measure of spatial isolation, with greater distance indicating greater isolation. Stuart has found a significant relationship between distance and reproductive success in Echinacea previously (that study can be found here:
  • Start date and flowering duration. Flowering phenology, the timing and duration of flowering, is perhaps more important than spatial isolation in determining availability of compatible mates. If two plants are very close in space, but they do not flower at the same time, there is no possibility for mating. Previous data has shown that plants flowering earlier in the season have higher reproductive success (
  • Section of head from which the achene originated. Because florets at the base Echinacea heads begin flowering first, and the ones at the top flower last, it is possible to examine how reproductive success, and thus the mating scene, differ for a single head across time. The bottom 30 achenes, the middle achenes, and the top 30 achenes are separated to represent the beginning, middle, and end of the timing of flowering.
  • Seed set, or proportion of achenes that contain a seed, will be used to quantify reproductive success. This will be the that the model I create will try to predict using the variables I listed above.

In order to create a model, I will be using a technique known as backwards elimination as described in Statistics: An Introduction Using R (Crawley 2015). I will start by creating a statistical model containing my response variable (seed set), and all of my explanatory or predictive variables (isolation, phenology, section of head), along with all interactive effects between the explanatory variables. I will then eliminate a single predictor or interaction at a time and perform an analysis of deviance to determine whether or not that predictor was important to the predictive value of the model. If it is important, I will leave it in, but if it’s not, I will take it out. This process continues until all predictors and interactions left in the model have a significant effect on the response. This model, known as the minimal adequate model, is the simplest model that still includes all important variables.


Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>