This is where whatever we have learnt culminates. That is.. + data types + vectors + matrix + lists
A data frame is collection of vectors of equal length. Lets see how it gets defined. It’s type is different than list. We have to use data.frame() to define a variable.
Example:
> cdg = c(23.5, 12,2,40,45)
> ndls = c(12,20,0, 35, 20)
> rainfall = data.frame(cdg, ndls)
> rainfall
cdg ndls
1 23.5 12
2 12.0 20
3 2.0 0
4 40.0 35
5 45.0 20
> #note here the name of the variables is caught by data.frame
> month = c("Jan", "Feb", "Mar", "Apr", "May")
> month
[1] "Jan" "Feb" "Mar" "Apr" "May"
> cbind(rainfall, month)
cdg ndls month
1 23.5 12 Jan
2 12.0 20 Feb
3 2.0 0 Mar
4 40.0 35 Apr
5 45.0 20 May
> row.names(rainfall) = c("Jan", "Feb", "Mar", "Apr", "May")
> rainfall
cdg ndls
Jan 23.5 12
Feb 12.0 20
Mar 2.0 0
Apr 40.0 35
May 45.0 20
# checkout every row has got a labelFurther, like in list, one can assign values as:
> country = data.frame(states=c("Andhra Pradesh", "Uttar Pradesh", "Punjab"), capital=c("Hyderabad", "Lucknow", "Chandigarh"))
> country
states capital
1 Andhra Pradesh Hyderabad
2 Uttar Pradesh Lucknow
3 Punjab Chandigarh
> # here we see column names/labels are used like an associative arrayIt is same way as we do it for lists. That is use of [[]].
> rainfall[[2]]
[1] 12 20 0 35 20
> rainfall[["ndls"]]
[1] 12 20 0 35 20
> rainfall[,"ndls"]
[1] 12 20 0 35 20
> rainfall[,1:2]
cdg ndls
Jan 23.5 12
Feb 12.0 20
Mar 2.0 0
Apr 40.0 35
May 45.0 20
> rainfall[,c("ndls", "cdg")]
ndls cdg
Jan 12 23.5
Feb 20 12.0
Mar 0 2.0
Apr 35 40.0
May 20 45.0
> rainfall[c(TRUE, TRUE, FALSE, FALSE, FALSE),]
cdg ndls
Jan 23.5 12
Feb 12.0 20
> rainfall[c(2,5), ]
cdg ndls
Feb 12 20
May 45 20
>Lets learn a simple trick here:
> rainfall$cdg
[1] 23.5 12.0 2.0 40.0 45.0
> rainy = rainfall$cdg > 10
> rainy
[1] TRUE TRUE FALSE TRUE TRUE
> rainfall[rainy,]
cdg ndls
Jan 23.5 12
Feb 12.0 20
Apr 40.0 35
May 45.0 20
> typeof(rainy)
> [1] "logical"We used logical operation here applied over all members of a vector.
Different packages within R support multiple data file formats. Lets see their mapping below:
Format, Package/Library, Function used
Set up the current working directory to different location where the data files are lying:
setcwd("/home/vineet/R/data")
getwd()
[1] "/home/vineet/R/data"