There are two aspects of R language which we need to understand: objects and functions.
Object
An Object can be thought of as a storage space for an associated name, for example:
> x <- 916
Here we have created an object which has stored the value 916. "<-" is the assignment operator in R. It is good to remember that everything is stored as an object in R.
Type x at R prompt:
> x
You get the following output,
[1] 916
The 1 within square brackets tell us that this is the first element in the x object (in this case the only element). As we shall see that an object can contain several elements. At that time the numbers within square brackets will be helpful.
Function
Function is a special type of R object designed to carry out some operation. Function usually takes some arguments and produce a result by means of executing some set of operations. R comes with a set of functions for our use, but we can create our own functions.
You can take a look at what objects are available in the current R session by typing:
> ls()
Since we have created one object, x, we get the following output from R:
[1] "x"
Objects you create stay in the memory until you delete them. You can delete object to free up memory by:
> rm(x)
Now type ls() to see the list of objects again. R outputs the following:
character(0).
Object names may consist of any upper- and lower-case letters, the digits 0 to 9 (except in the beginning of the name), and also the period, \.", which behaves like a letter. Note that names in R are case sensitive, meaning that Color and color are two distinct objects. This is a frequent cause of frustration for beginners who keep getting \object not found" errors. If you face this type of error, start by checking the correctness of the name of the object causing the error.
The most basic data object in R is a vector. When we create the object x, we created a vector with the value 916. Every object has a length and a mode.
The mode tells you the kind of data stored in the object. Vectors are used to store a set of elements
of the same atomic data type. The main atomic types are character, logical, numeric, or complex. Hence, you may have vectors of characters, logical values (T or F or FALSE or TRUE), numbers, and complex numbers.
Let's create another vector (object):
> y <- 1:10
> y
Output: [1] 1 2 3 4 5 6 7 8 9 10
> length(y)
Output: [1] 10
> mode(y)
Output: [1] "numeric"
All elements of a vector must be of same mode. Meaning, all elements must be of same type. Try the following:
> y <- c(1:5,"Hello")
> y
[1] "1" "2" "3" "4" "5" "Hello"
> mode(y)
[1] "character"
> length(y)
[1] 6
First of all we have created a using the c() function which combines the arguments to create a vector y. Within the c() function we used "1:5". This is just an alternative to typing 1,2,3,4,5.
Then we added another argument "Hello" (Character type). when we printed the elements of y, we got all the elements within double quotes. In the next line we checked the mode of the vector y. We got "Character". R has used type coercion. Since we provided a character type element, it converted all numeric elements to character type to maintain the integrity of the vector.
Point to remember: All elements of a vector must be of same mode.
We can refer to elements of a vector in the following way:
> y[1]
Output: [1] "1"
We can change elements in the following way:
> y[1] = "New Value"
> y
Output: [1] "New Value" "2" "3" "4" "5" "Hello"
As expected, the first element of the vector y has been changed to "New Value". You might have noticed that we have changed the assignment operator to "=" which just works fine. This time let's change the value the old way:
> y[2] <- "Another Value"
> y
Output: [1] "New Value" "Another Value" "3" "4" "5" "Hello"
We can perform all sorts of operations on vectors.
> vect = 1:10
> vect
Output: [1] 1 2 3 4 5 6 7 8 9 10
> vect = vect + 2
> vect
Output: [1] 3 4 5 6 7 8 9 10 11 12
You can see that every element has been incremented by 2.
> vect = vect * 2
> vect
Output: [1] 6 8 10 12 14 16 18 20 22 24
Every element of the vector has been multiplied by 2.
> vect = sqrt(vect)
> vect
Output: [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
[9] 3.000000 3.162278
In the last example we have run the square root operation on all the elements of the vector "vect", and assigned them back to "vect".
Similarly, we can add 2 vectors:
> x = 1:10
> y = 11:20
> z = x + y
> z
Output: [1] 12 14 16 18 20 22 24 26 28 30
Missing Values:
Missing value is represented by NA in R
>x=1:10
Let's replace 9th element with NA
>x[9]=NA
>sum(x)
[1] NA
We need a way to tackle this by performing sum function excluding the NA
> sum(x, na.rm=T)
[1] 46
Try mean
> mean(x, na.rm=T)
[1] 5.111111
You can see the total has been divided by 9. This is what you want.
Factor
Another important aspect of R is factor. This provides a convenient way of handling categorical (nominal) variables. Factors have levels that determines the possible values the variable can take. Let see an example:
> coffee = c("cold", "right", "hot", "hot", "right", "cold")
> factor(coffee)
Output: [1] cold right hot hot right cold
Levels: cold hot right
> table(factor(coffee))
Output: cold hot right
2 2 2
We can use gl() function to generate sequences involving factors:
> lab=gl(3, 5, labels = c("child", "adult", "old"))
> lab
[1] child child child child child adult adult adult adult adult old old old old old
Levels: child adult old
> table(lab)
lab
child adult old
5 5 5
Generating Random Numbers
R has several functions that can be used to generate random sequences according to di erent probability density functions. The functions have the generic structure rfunc(n, par1, par2, ...), where func is the
name of the probability distribution, n is the number of data to generate, and par1, par2, ... are the values of some parameters of the density function that may be required. For instance, if you want ten randomly generated numbers from a normal distribution with zero mean and unit standard deviation,
type:
> rnorm(5)
[1] -0.4335585 -0.1092160 0.1082784 -0.5065135 -0.5878001
Indexes
Will come back later
Arrays and Matrices
> mat = 1:20
> mat=matrix(mat,4,5)
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> mat=1:20
> mat = matrix(mat,2,10)
> mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 3 5 7 9 11 13 15 17 19
[2,] 2 4 6 8 10 12 14 16 18 20
> rownames(mat)<-c("one","two")
> rownames(mat)
[1] "one" "two"
> mat["one",]
[1] 1 3 5 7 9 11 13 15 17 19
The same can be achieved with following command:
> mat[1,]
[1] 1 3 5 7 9 11 13 15 17 19
Output:
> student
$id
[1] 67
$name
[1] "Jamal"
$marks
[1] 77 88 99
You may check the mode:
> mode(student)
Output:
[1] "list"
You may extract individual elements by using [n] notation where n is the subscript.
> student[1]
$id
[1] 67
R returns a list which is a sub-list of the list object 'student'. You can verify this:
> mode(student[1])
[1] "list"
In order to extract the value of 'id':
> student[[1]]
[1] 67
You have to use double square bracket alongwith the element subscript to extract the value.
You can verify the mode:
> mode(student[[1]])
[1] "numeric"
Try these:
> student[[2]]
[1] "Jamal"
> mode(student[[2]])
[1] "character"
> student[[3]]
[1] 77 88 99
Create a dataframe:
> myset = data.frame(id = c(916,917,918), names = c("Dilir","Tr","Arif"))
> myset
id names
1 916 Dilir
2 917 Tr
3 918 Arif
You can refer to a variable in the following manner:
> myset$names
[1] Dilir Tr Arif
Levels: Arif Dilir Tr
You can use the table function as well.
> table(myset$names)
Arif Dilir Tr
1 1 1
> myset$id
[1] 916 917 918
Object
An Object can be thought of as a storage space for an associated name, for example:
> x <- 916
Here we have created an object which has stored the value 916. "<-" is the assignment operator in R. It is good to remember that everything is stored as an object in R.
Type x at R prompt:
> x
You get the following output,
[1] 916
The 1 within square brackets tell us that this is the first element in the x object (in this case the only element). As we shall see that an object can contain several elements. At that time the numbers within square brackets will be helpful.
Function
Function is a special type of R object designed to carry out some operation. Function usually takes some arguments and produce a result by means of executing some set of operations. R comes with a set of functions for our use, but we can create our own functions.
You can take a look at what objects are available in the current R session by typing:
> ls()
Since we have created one object, x, we get the following output from R:
[1] "x"
Objects you create stay in the memory until you delete them. You can delete object to free up memory by:
> rm(x)
Now type ls() to see the list of objects again. R outputs the following:
character(0).
Object names may consist of any upper- and lower-case letters, the digits 0 to 9 (except in the beginning of the name), and also the period, \.", which behaves like a letter. Note that names in R are case sensitive, meaning that Color and color are two distinct objects. This is a frequent cause of frustration for beginners who keep getting \object not found" errors. If you face this type of error, start by checking the correctness of the name of the object causing the error.
The most basic data object in R is a vector. When we create the object x, we created a vector with the value 916. Every object has a length and a mode.
The mode tells you the kind of data stored in the object. Vectors are used to store a set of elements
of the same atomic data type. The main atomic types are character, logical, numeric, or complex. Hence, you may have vectors of characters, logical values (T or F or FALSE or TRUE), numbers, and complex numbers.
Let's create another vector (object):
> y <- 1:10
> y
Output: [1] 1 2 3 4 5 6 7 8 9 10
> length(y)
Output: [1] 10
> mode(y)
Output: [1] "numeric"
All elements of a vector must be of same mode. Meaning, all elements must be of same type. Try the following:
> y <- c(1:5,"Hello")
> y
[1] "1" "2" "3" "4" "5" "Hello"
> mode(y)
[1] "character"
> length(y)
[1] 6
First of all we have created a using the c() function which combines the arguments to create a vector y. Within the c() function we used "1:5". This is just an alternative to typing 1,2,3,4,5.
Then we added another argument "Hello" (Character type). when we printed the elements of y, we got all the elements within double quotes. In the next line we checked the mode of the vector y. We got "Character". R has used type coercion. Since we provided a character type element, it converted all numeric elements to character type to maintain the integrity of the vector.
Point to remember: All elements of a vector must be of same mode.
We can refer to elements of a vector in the following way:
> y[1]
Output: [1] "1"
We can change elements in the following way:
> y[1] = "New Value"
> y
Output: [1] "New Value" "2" "3" "4" "5" "Hello"
As expected, the first element of the vector y has been changed to "New Value". You might have noticed that we have changed the assignment operator to "=" which just works fine. This time let's change the value the old way:
> y[2] <- "Another Value"
> y
Output: [1] "New Value" "Another Value" "3" "4" "5" "Hello"
We can perform all sorts of operations on vectors.
> vect = 1:10
> vect
Output: [1] 1 2 3 4 5 6 7 8 9 10
> vect = vect + 2
> vect
Output: [1] 3 4 5 6 7 8 9 10 11 12
You can see that every element has been incremented by 2.
> vect = vect * 2
> vect
Output: [1] 6 8 10 12 14 16 18 20 22 24
Every element of the vector has been multiplied by 2.
> vect = sqrt(vect)
> vect
Output: [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
[9] 3.000000 3.162278
Similarly, we can add 2 vectors:
> x = 1:10
> y = 11:20
> z = x + y
> z
Output: [1] 12 14 16 18 20 22 24 26 28 30
Missing Values:
Missing value is represented by NA in R
>x=1:10
Let's replace 9th element with NA
>x[9]=NA
>sum(x)
[1] NA
We need a way to tackle this by performing sum function excluding the NA
> sum(x, na.rm=T)
[1] 46
Try mean
> mean(x, na.rm=T)
[1] 5.111111
You can see the total has been divided by 9. This is what you want.
Factor
Another important aspect of R is factor. This provides a convenient way of handling categorical (nominal) variables. Factors have levels that determines the possible values the variable can take. Let see an example:
> coffee = c("cold", "right", "hot", "hot", "right", "cold")
> factor(coffee)
Output: [1] cold right hot hot right cold
Levels: cold hot right
> table(factor(coffee))
Output: cold hot right
2 2 2
> lab=gl(3, 5, labels = c("child", "adult", "old"))
> lab
[1] child child child child child adult adult adult adult adult old old old old old
Levels: child adult old
gl(n, k, length = n*k, labels = 1:n, ordered = FALSE)
Arguments
n | an integer giving the number of levels. |
k | an integer giving the number of replications. |
length | an integer giving the length of the result. |
labels | an optional vector of labels for the resulting factor levels. |
ordered | a logical indicating whether the result should be ordered or not. |
lab
child adult old
5 5 5
Generating Random Numbers
R has several functions that can be used to generate random sequences according to di erent probability density functions. The functions have the generic structure rfunc(n, par1, par2, ...), where func is the
name of the probability distribution, n is the number of data to generate, and par1, par2, ... are the values of some parameters of the density function that may be required. For instance, if you want ten randomly generated numbers from a normal distribution with zero mean and unit standard deviation,
type:
> rnorm(5)
[1] -0.4335585 -0.1092160 0.1082784 -0.5065135 -0.5878001
> rt(5, df=7)
[1] -1.07614048 -0.02142847 0.88955231 -1.42091564 1.29517603
Will come back later
Arrays and Matrices
> mat = 1:20
> mat=matrix(mat,4,5)
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> mat=1:20
> mat = matrix(mat,2,10)
> mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 3 5 7 9 11 13 15 17 19
[2,] 2 4 6 8 10 12 14 16 18 20
> rownames(mat)<-c("one","two")
[1] "one" "two"
> mat["one",]
[1] 1 3 5 7 9 11 13 15 17 19
The same can be achieved with following command:
> mat[1,]
[1] 1 3 5 7 9 11 13 15 17 19
What if you want the 6th column?
> mat[,6]
ones twos
11 12
Arrays are similar to indexes but an array can have more than 2 dimensions.
List
List elements need not be of same mode or length.
> student = list(id=67,
+ name='Jamal',
+ marks = c(77,88,99)
+ )
Output:
> student
$id
[1] 67
$name
[1] "Jamal"
$marks
[1] 77 88 99
You may check the mode:
> mode(student)
Output:
[1] "list"
You may extract individual elements by using [n] notation where n is the subscript.
> student[1]
$id
[1] 67
R returns a list which is a sub-list of the list object 'student'. You can verify this:
> mode(student[1])
[1] "list"
In order to extract the value of 'id':
> student[[1]]
[1] 67
You have to use double square bracket alongwith the element subscript to extract the value.
You can verify the mode:
> mode(student[[1]])
[1] "numeric"
Try these:
> student[[2]]
[1] "Jamal"
> mode(student[[2]])
[1] "character"
> student[[3]]
[1] 77 88 99
> mode(student[[3]])
[1] "numeric"Dataframe
Data frame is a versatile data object in R. Data frame object is like a spreadsheet. Each column of the data frame is a vector. All data elements in the column must be of the same mode. However, different vectors can be of different modes. All vectors in a data frame must be of the same length.Create a dataframe:
> myset = data.frame(id = c(916,917,918), names = c("Dilir","Tr","Arif"))
> myset
id names
1 916 Dilir
2 917 Tr
3 918 Arif
You can refer to a variable in the following manner:
> myset$names
[1] Dilir Tr Arif
Levels: Arif Dilir Tr
You can use the table function as well.
> table(myset$names)
Arif Dilir Tr
1 1 1
> myset$id
[1] 916 917 918



.jpg)
