R Programming Tutorial (Part 2 - Data Collections!)

in #r7 years ago

Introduction

Working off of my previous R programming post here, I'll continue with the core of R programming: data collections.

Data Collections

Frequently, your program will require that you store multiple data items together. This might be because you have a group of data that should be referenced together, or even to reduce the number of variables you have to define. Regardless of why, there are four data collections that you can utilize in R: Vectors, Matrices, Lists, and DataFrames.

Vectors

The most basic object in R is known as vector, which contains objects of the same class. Let's try creating vectors of different classes. We can create vector using c():

a <- c(1.8, 4.5)   # numeric
b <- c(1 + 2i, 3 - 6i) # complex
d <- c(23, 44)   # integer

Challenge

Using the variable vec1, create a vector with 5 numerical values.

vec1 <- c(1,2,3,4,5)
print(vec1)
[1] 1 2 3 4 5

Matrices

When a vector is introduced with row and columns (the dimension attribute), it becomes a matrix. It consist of elements of the same class, such as the following:

my_matrix <- matrix(1:6, nrow=3, ncol=2)
print(my_matrix)
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Challenge

  1. Create two vectors with the values 1 to 5 and 10.5 to 12.5, respectively. Then concatinate these two vectors into 1 vector, named vec1. What is the class? Call the function and assign its result to the variable class1.

  2. Change the 4th element of the above vector to the word 'four' and assign it to the vector vec2. Did this change the class? Call the function and assign its result to the variable class2.

  3. Using the rep() function, create a vector that repeats the values 1 2 3 twice. Assign this vector the variable vec3. (Result: 1 2 3 1 2 3)

  4. Create a 3 by 4 matrix where each row has the same value. Assign this to the variable matrix1 (hint: use the rep function)

  5. Create a 4 by 3 matrix where each row has the same value. Assign this matrix to the variable matrix2. (hint: use the rep() function)

Lists

Lists are present in R, as well as most other programming languages. A list is a data structure that can hold any number of any types of other data structures. For example, if you have vector, a dataframe, and a character object, you can put all of those into one list object.

Constructing a List

To begin constructing a list, we'll create three variables with different data types. Since lists support mixed types, we'll use these to add to a list.

vec <- 1:4
num <- 17
char <- "Hello!"

Then you can add all three objects to one list using list() function:

list1 <- list(vec, num, char)

print(list1)
    [[1]]
    [1] 1 2 3 4
    
    [[2]]
    [1] 17
    
    [[3]]
    [1] "Hello!"

You can also turn an object into a list by using the as.list() function. Notice how every element of the vector becomes a different component of the list.

Manipulating a List

We can put names on the components of a list using the names() function, which is useful for extracting components. We could have also named the components when we created the list.

names(list1) <- c("Numbers", "Some.data", "Letters")

Extracting Components

The first way you can extract an object from the list is by using the [[ ]] operator.

list1[[3]]

'Hello!'

It's also possible to extract components using the component’s name, as shown below:

list1$Letters

'Hello!'

Subsetting a List

If you want to take a subset of a list, you can use the [ ] operator and c() to choose the components:

list1[c(1, 3)]
$Numbers
1 2 3 4
$Letters
'Hello!'

We can also add a new component to the list or replace a component using the $ or [[ ]] operators, such as the following two examples:

list1$newthing <- lm(y ~ x, data = df)
list1[[5]] <- "new component"

Finally, we can delete a component of a list by setting it equal to NULL:

list1$Letters <- NULL

Describing Lists

Now we'll go over ways in which we can extract list properties.

Class

The class of the list and the class of one of the components of the list.

class(list1)

'list'

class(list1[[1]])

'integer'

Size

You can find the size of a list with the length() method, like in the following:

length(list1)

5

Converting

Finally, we can convert a list into a matrix, dataframe, or vector in a number of different ways. The first, most basic way is to use unlist(), which just turns the whole list into one long vector:

unlist(list1)

Challenge

  1. Create a new vector that performs the operation 2x^2 for x from 0 to 6. Assign this vector to the variable f.

  2. Create a new vector that contains the value 0 repeated 5 times. Assign this vector to the variable r.

  3. Create a list with vectors f and r, as well as with the element, 'hello'. Assign this list to the variable list1.

DataFrame

DataFrames are used to store tabular data. It's similar to a matrix in that there are rows and columns, but it's different because every element does nothave to be the same class. In a dataFrame, you can put list of vectors containing different classes. This means that every column of a data frame acts like a list.

df <- data.frame(name = c("ash","jane","paul","mark"), score = c(67,56,87,91))
print(df)
  name score
1  ash    67
2 jane    56
3 paul    87
4 mark    91

DataFrame objects are incredibly useful when working with data that has relational relationships, such as a csv file. You'll soon see the extent to which these become useful soon enough!

Challenge

Using the variable df1, create a 3x3 dataframe using three lists.

To summarize this succinctly,

StructureMultidimensionMultiple Types
VectorNot CapableNot Capable
MatrixCapableNot Capable
ListNot CapableCapable
DataFrameCapableCapable

Final Words

If you liked any of this material, feel free to check out the GitHub here and stay tuned for more posts by me! If you have any solutions or questions about the challenge questions, drop a comment and I'll get back to you.