Exercise 1

Exercise 1: Write an expression to compute the number of seconds in a 365-day year, and execute the expression.

The number of seconds in a 365-day year:

365*24*60*60

## [1] 31536000

Exercise 2

Exercise 2: Define a workspace object which contains the number of seconds in 365-day year, and display the results.

A workspace object which contains the number of seconds in 365-day year, and the results.

(s.in.yr <- 365*24*60*60)

## [1] 31536000

Exercise 3

Exercise 3: Find the function name for base-10 logarithms, and compute the base-10 logarithm of 10, 100, and 1000 (use the ?? function at the console to search).

??log10
?log10

The function name for base-10 logarithms, and the base-10 logarithm of 10, 100, and 1000.

log10(10); log10(100); log10(1000)

## [1] 1

## [1] 2

## [1] 3

Exercise 4

Exercise 4: What are the arguments of the rbinom (random numbers following the binomial distribution) function? Are any default or must all be specified? What is the value returned?

help(rbinom)

There are three arguments, all of which must all be specified: n: number of observations; size: number of trials; prob: probability of success on each trial. The value returned is a vector of length n with the number of successes in each trial.

Exercise 5

Exercise 5: Display the vector of the number of successes in 24 trials with probability of success 0.2 (20%), this simulation carried out 128 times.

(v <- rbinom(128, 24, 0.2))

##   [1]  3 10  6  3  4  7  3  9  4  7  5  4  4  3  2  4 10  1  5  2  4  4  4  6  9
##  [26]  6  3  5  3  5  6  6  2  7  2  5  7  8  5  7  3  4  3  5  5  6  7  5  7  5
##  [51]  5  5  2  4  6  5  4  6  5  1  5  3  5  2  2  5  6  4  4  4  3  2  1  3  5
##  [76]  3  5  4  3  7  7  6  5  4  6  4  4  2  3  4  9  6  6  6  3  6  5  5  4  7
## [101]  2  3  4  2  5  5  5  6  6  3  3  4  6  6  3  4  6  3  5  7  2  3  6  5  4
## [126]  3  4  9

The vector of the number of successes in 24 trials with probability of success 0.2 (20%), this simulation carried out 128 times.

Exercise 6

Summarize the result of rbinom (previous exercise) with the table function. What is the range of results, i.e., the minimum and maximum values? Which is the most likely result? For these, write text which includes the computed results. This is necessary because the results change with each random sampling.

print(tv <- table(v <- rbinom(128, 24, 0.2)))

## 
##  0  1  2  3  4  5  6  7  8  9 10 
##  1  2  8 15 29 23 25 13  7  4  1

(tv.df <- as.data.frame(tv))

##    Var1 Freq
## 1     0    1
## 2     1    2
## 3     2    8
## 4     3   15
## 5     4   29
## 6     5   23
## 7     6   25
## 8     7   13
## 9     8    7
## 10    9    4
## 11   10    1

max.count <- max(tv.df$Freq)
ix <- which(tv.df$Freq == max.count)
tv.df[ix, ]

##   Var1 Freq
## 5    4   29

The range is from 0 to 10; the modal value is 4; in this simulation that value is found 29 times.

Displaying the modal value is tricky; it requires you to convert the results of table() to a data.frame and find the highest frequency value(s), then report that(those) value(s).

Exercise 7

Exercise 7: Create and display a vector representing latitudes in degrees from \(0^\circ\) (equator) to \(+90^\circ\) (north pole), in intervals of \(5^\circ\). Compute and display their cosines – recall, the trig functions in R expect arguments in radians. Find and display the maximum cosine.

A vector representing latitudes in degrees from \(0^\circ\) (equator) to \(+90^\circ\) (north pole), in intervales of \(5^\circ\):

(angles <- seq(0, 90, by=5))

##  [1]  0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Their cosines and its maximum value:

deg.rad <- pi/180
angles*deg.rad

##  [1] 0.00000000 0.08726646 0.17453293 0.26179939 0.34906585 0.43633231
##  [7] 0.52359878 0.61086524 0.69813170 0.78539816 0.87266463 0.95993109
## [13] 1.04719755 1.13446401 1.22173048 1.30899694 1.39626340 1.48352986
## [19] 1.57079633

round(angles.cos <- cos(angles*deg.rad),4)

##  [1] 1.0000 0.9962 0.9848 0.9659 0.9397 0.9063 0.8660 0.8192 0.7660 0.7071
## [11] 0.6428 0.5736 0.5000 0.4226 0.3420 0.2588 0.1736 0.0872 0.0000

max(angles.cos)

## [1] 1

Exercise 8

Exercise 8: Check if the gstat package is installed on your system. If not, install it. Load it into the workspace. Display its help and find the variogram function. What is its description?

Installing gstat:

install.packages("gstat", dependencies=TRUE)

Or, use the Install toolbar button of Packages tab of the Help RStudio pane.

Loading gstat into the workspace:

library(gstat)

Or, click next to the package’s name in Packages tab of the Help RStudio pane.

Help for the variogram function:

help(variogram, package="gstat")

Description: “Calculates the sample variogram from data, or in case of a linear model is given, for the residuals, with options for directional, robust, and pooled variogram, and for irregular distance intervals”

Exercise 9

Exercise 9: Display the classes of the built-in constant pi and of the built-in constant letters.

class(pi)

## [1] "numeric"

class(letters)

## [1] "character"

Exercise 10

Exercise 10: What is the class of the object returned by the variogram function? (Hint: see the heading “Value” in the help text.)

help(variogram)

The variogram function returns an object of class gstatVariogram.

Exercise 11

Exercise 11: List the datasets in the gstat package.

Datasets in the gstat package:

data(package="gstat")

Exercise 12

Exercise 12: Load, summarize, and show the structure of the oxford dataset.

library(gstat)
data(oxford)
summary(oxford)

##     PROFILE           XCOORD        YCOORD          ELEV       PROFCLASS
##  Min.   :  1.00   Min.   :100   Min.   : 100   Min.   :540.0   Cr:19    
##  1st Qu.: 32.25   1st Qu.:200   1st Qu.: 600   1st Qu.:558.0   Ct:36    
##  Median : 63.50   Median :350   Median :1100   Median :573.0   Ia:71    
##  Mean   : 63.50   Mean   :350   Mean   :1100   Mean   :573.6            
##  3rd Qu.: 94.75   3rd Qu.:500   3rd Qu.:1600   3rd Qu.:584.5            
##  Max.   :126.00   Max.   :600   Max.   :2100   Max.   :632.0            
##  MAPCLASS      VAL1            CHR1           LIME1            VAL2     
##  Cr:31    Min.   :2.000   Min.   :1.000   Min.   :0.000   Min.   :4.00  
##  Ct:36    1st Qu.:3.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:4.00  
##  Ia:59    Median :4.000   Median :2.000   Median :4.000   Median :8.00  
##           Mean   :3.508   Mean   :2.468   Mean   :2.643   Mean   :6.23  
##           3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:4.000   3rd Qu.:8.00  
##           Max.   :4.000   Max.   :4.000   Max.   :4.000   Max.   :8.00  
##       CHR2       LIME2          DEPTHCM         DEP2LIME         PCLAY1     
##  Min.   :2   Min.   :0.000   Min.   :10.00   Min.   :20.00   Min.   :10.00  
##  1st Qu.:2   1st Qu.:4.000   1st Qu.:25.00   1st Qu.:20.00   1st Qu.:20.00  
##  Median :2   Median :5.000   Median :36.00   Median :20.00   Median :24.50  
##  Mean   :3   Mean   :3.889   Mean   :46.25   Mean   :30.32   Mean   :24.44  
##  3rd Qu.:4   3rd Qu.:5.000   3rd Qu.:64.75   3rd Qu.:40.00   3rd Qu.:28.00  
##  Max.   :6   Max.   :5.000   Max.   :91.00   Max.   :90.00   Max.   :37.00  
##      PCLAY2           MG1              OM1              CEC1      
##  Min.   :10.00   Min.   : 19.00   Min.   : 2.600   Min.   : 7.00  
##  1st Qu.:10.00   1st Qu.: 44.00   1st Qu.: 4.100   1st Qu.:12.00  
##  Median :10.00   Median : 72.00   Median : 5.350   Median :15.00  
##  Mean   :14.76   Mean   : 93.53   Mean   : 5.995   Mean   :18.88  
##  3rd Qu.:20.00   3rd Qu.:123.25   3rd Qu.: 7.175   3rd Qu.:25.25  
##  Max.   :40.00   Max.   :308.00   Max.   :13.100   Max.   :43.00  
##       PH1            PHOS1             POT1      
##  Min.   :4.200   Min.   : 1.700   Min.   : 83.0  
##  1st Qu.:7.200   1st Qu.: 6.200   1st Qu.:127.0  
##  Median :7.500   Median : 8.500   Median :164.0  
##  Mean   :7.152   Mean   : 8.752   Mean   :181.7  
##  3rd Qu.:7.600   3rd Qu.:10.500   3rd Qu.:194.8  
##  Max.   :7.700   Max.   :25.000   Max.   :847.0

str(oxford)

## 'data.frame':    126 obs. of  22 variables:
##  $ PROFILE  : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ XCOORD   : num  100 100 100 100 100 100 100 100 100 100 ...
##  $ YCOORD   : num  2100 2000 1900 1800 1700 1600 1500 1400 1300 1200 ...
##  $ ELEV     : num  598 597 610 615 610 595 580 590 598 588 ...
##  $ PROFCLASS: Factor w/ 3 levels "Cr","Ct","Ia": 2 2 2 3 3 2 3 2 3 3 ...
##  $ MAPCLASS : Factor w/ 3 levels "Cr","Ct","Ia": 2 3 3 3 3 2 2 3 3 3 ...
##  $ VAL1     : num  3 3 4 4 3 3 4 4 4 3 ...
##  $ CHR1     : num  3 3 3 3 3 2 2 3 3 3 ...
##  $ LIME1    : num  4 4 4 4 4 0 2 1 0 4 ...
##  $ VAL2     : num  4 4 5 8 8 4 8 4 8 8 ...
##  $ CHR2     : num  4 4 4 2 2 4 2 4 2 2 ...
##  $ LIME2    : num  4 4 4 5 5 4 5 4 5 5 ...
##  $ DEPTHCM  : num  61 91 46 20 20 91 30 61 38 25 ...
##  $ DEP2LIME : num  20 20 20 20 20 20 20 20 40 20 ...
##  $ PCLAY1   : num  15 25 20 20 18 25 25 35 35 12 ...
##  $ PCLAY2   : num  10 10 20 10 10 20 10 20 10 10 ...
##  $ MG1      : num  63 58 55 60 88 168 99 59 233 87 ...
##  $ OM1      : num  5.7 5.6 5.8 6.2 8.4 6.4 7.1 3.8 5 9.2 ...
##  $ CEC1     : num  20 22 17 23 27 27 21 14 27 20 ...
##  $ PH1      : num  7.7 7.7 7.5 7.6 7.6 7 7.5 7.6 6.6 7.5 ...
##  $ PHOS1    : num  13 9.2 10.5 8.8 13 9.3 10 9 15 12.6 ...
##  $ POT1     : num  196 157 115 172 238 164 312 184 123 282 ...

Exercise 13

Exercise 13: load the women sample dataset. How many observations (cases) and how many attributes (fields) for each case? What are the column (field) and row names? What is the height of the first-listed woman?

data(women)
dim(women)

## [1] 15  2

colnames(women)

## [1] "height" "weight"

row.names(women)

##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"

women[1,"height"]

## [1] 58

There are 15 observations (cases) and 2 attributes (fields) for each case. The column (field) names are height, weight and the row (cases) names are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. The first woman is 58 inches tall.

Exercise 14

Exercise 14: List the factors in the oxford dataset.

names(which(sapply(oxford, is.factor)))

## [1] "PROFCLASS" "MAPCLASS"

Exercise 15

Exercise 15: Identify the thin trees, defined as those with height/girth ratio more than 1 s.d. above the mean. You will have to define a new field in the dataframe with this ratio, and then use the mean and sd summary functions, along with a logical expression.

trees$hg <- trees$Height/trees$Girth
# thin trees have a height/girth ratio more than 1 s.d. above the mean
(thin.trees <- subset(trees, hg > (mean(trees$hg) + sd(trees$hg))))

##   Girth Height Volume       hg
## 1   8.3     70   10.3 8.433735
## 2   8.6     65   10.3 7.558140
## 5  10.7     81   18.8 7.570093
## 6  10.8     83   19.7 7.685185
## 9  11.1     80   22.6 7.207207

Exercise 16

Exercise 16: Display a histogram of the diamond prices in the diamonds dataset.

data(diamonds, package="ggplot2")
hist(diamonds$price)

Exercise 17

Exercise 17: Write a model to predict tree height from tree girth. How much of the height can be predicted from the girth?

model.hg <- lm(Height ~ Girth, data=trees)
# equivalent to: model <- lm(trees$Volume ~ trees$Height)
summary(model.hg)

## 
## Call:
## lm(formula = Height ~ Girth, data = trees)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.5816  -2.7686   0.3163   2.4728   9.9456 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  62.0313     4.3833  14.152 1.49e-14 ***
## Girth         1.0544     0.3222   3.272  0.00276 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.538 on 29 degrees of freedom
## Multiple R-squared:  0.2697, Adjusted R-squared:  0.2445 
## F-statistic: 10.71 on 1 and 29 DF,  p-value: 0.002758

Only 24.4% of the variance in tree heights can be explained by its girth.

Exercise 18

Exercise 18: Write a model to predict tree volume as a linear function of tree height and tree girth, with no interaction.

model.vhg <- lm(Volume ~ Height + Girth, data=trees)
summary(model.vhg)

## 
## Call:
## lm(formula = Volume ~ Height + Girth, data = trees)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.4065 -2.6493 -0.2876  2.2003  8.4847 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -57.9877     8.6382  -6.713 2.75e-07 ***
## Height        0.3393     0.1302   2.607   0.0145 *  
## Girth         4.7082     0.2643  17.816  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.882 on 28 degrees of freedom
## Multiple R-squared:  0.948,  Adjusted R-squared:  0.9442 
## F-statistic:   255 on 2 and 28 DF,  p-value: < 2.2e-16

Exercise 19

Exercise 19: Write a function to restrict the values of a vector to the range \(0 \ldots 1\). Any values \(< 0\) should be replaced with \(0\), and any values \(>1\) should be replaced with \(1\). Test the function on a vector with elements from \(-1.2\) to \(+1.2\) in increments of \(0.1\) – see the seq “sequence” function.

limit.01 <- function(v) {
  changed <- 0
  ix <- which(v < 0); v[ix] <- 0
  changed <- changed+length(ix)
  ix <- which(v > 1); v[ix] <- 1
  changed <- changed+length(ix)
  print(paste("Number of elements limited to 0..1:", changed))
  return(v)
}

Test of this function:

(test.v <- seq(-0.2, 1.2, by=0.1))

##  [1] -0.2 -0.1  0.0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.0  1.1  1.2

limit.01(test.v)

## [1] "Number of elements limited to 0..1: 5"

##  [1] 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.0 1.0

Bonus Exercise

Bonus Exercise : Use tidyverse functions and pipes on the trees dataset, to select the trees (use the filter function) with a volume greater than the median volume (use the median function), compute the ratio of girth to height as a new variable (use the mutate function), and sort by this (use the arrange function) from thin to thick trees.

library(dplyr)
data(trees)
names(trees)

## [1] "Girth"  "Height" "Volume"

trees %>% 
  filter(Volume > median(Volume)) %>%
  mutate(thickness=round(Girth/Height,3)) %>%
  arrange(thickness)

##    Girth Height Volume thickness
## 1   12.9     85   33.8     0.152
## 2   13.3     86   27.4     0.155
## 3   14.2     80   31.7     0.178
## 4   14.0     78   34.5     0.179
## 5   13.7     71   25.7     0.193
## 6   14.5     74   36.3     0.196
## 7   16.3     77   42.6     0.212
## 8   17.5     82   55.7     0.213
## 9   17.3     81   55.4     0.214
## 10  13.8     64   24.9     0.216
## 11  16.0     72   38.3     0.222
## 12  17.9     80   58.3     0.224
## 13  18.0     80   51.5     0.225
## 14  18.0     80   51.0     0.225
## 15  20.6     87   77.0     0.237

Introduction to R - model answers

D G Rossiter

2021-02-11