dataminer: 2014

Monday, August 18, 2014

ggplot Quick Reference

http://docs.ggplot2.org/current/
http://www.computerworld.com/s/article/9239799/60_R_resources_to_improve_your_data_skills?pageNumber=2

Friday, August 15, 2014

Summarizing Data Frames

attach(iris)
by(iris[,2:3], Species, colSums)

setwd("f:/coursera/coursera/exploratory data analysis")
options(stringsAsFactors=F)
student <- c("dilir","saif","enam","rafiq")
studentn <- c(916,914,937,891)
mark <- c(400,300,250,500)
students <- data.frame(student, studentn)
marks <- data.frame(studentn, mark)
saveRDS(students,"students.rds")
saveRDS(marks, "marks.rds")
rm(list=ls())

students <- readRDS("students.rds")
marks <- readRDS("marks.rds")
students

students <- rbind(students,data.frame(student="selim", studentn=934))
students
students <- students[students$student!='selim',]
students

Thursday, July 31, 2014

Creating Factor from Continuous Variable

Here we use the weight column of women dataset to create 3 levels of factor:
wf <- cut(women$weight,3)
> table(wf)
wf
(115,131] (131,148] (148,164]
6 5 4

Monday, July 14, 2014

Fetch Stock Data : R Program

##### Author : Dilir Khan ## http://analytics.dilir.net #### July 15, 2014###
#Language: R
r.page <- readLines("http://www.stockbangladesh.com/users/index")
#r.page <- readLines("stock.dat")
spot = grep("<h3>INDEX MOVER", r.page)
print(spot)
stop("")
# get stock symbol & Price
r1=regexec("([A-Z]+)</",c(r.page[spot+21], r.page[spot+28], r.page[spot+35], r.page[spot+42], r.page[spot+49]))
m1=regmatches(c(r.page[spot+21], r.page[spot+28], r.page[spot+35], r.page[spot+42], r.page[spot+49]), r1)
r2=regexec("([0-9]+\\.?[0-9]+)</td>",c(r.page[spot+22], r.page[spot+29], r.page[spot+36], r.page[spot+43], r.page[spot+44]))
m2=regmatches(c(r.page[spot+22], r.page[spot+29], r.page[spot+36], r.page[spot+43], r.page[spot+44]), r2)
for (i in 1:length(m1)){
print(paste(m1[[i]][1+1], m2[[i]][1+1], sep= " -> "))
}

Monday, June 30, 2014

Online Courses are available from reputed educational institutes

Hi everybody! I wanted to tell you that there are great courses being offered by famous institutes, These are online courses. You can join for free and earn a certificate by spending 4/5 hours per week for a course for 1/2 month(s).

www.coursera.org
www.edx.org

You should give it a try.

Good Luck

Dilir

Saturday, April 5, 2014

How to change Column heading(s) in R

x=1:5
x
headings= c("Serial","Year", "Sales", "Profit", "SalesRep")
names(x) <- headings
x
#Change the column heading of 5th column
names(x)[5]="Sales Rep"
x

Monday, March 17, 2014

Model Evaluation

We expect the "Multiple R-squared" value of the simplified model to be slightly worse than that of the initial model. It can't be better than the "Multiple R-squared" value of the initial model.

EXPLANATION

When we remove insignificant variables, the "Multiple R-squared" will always be worse, but only slightly worse. This is due to the nature of a linear regression model. It is always possible for the regression model to make a coefficient zero, which would be the same as removing the variable from the model. The fact that the coefficient is not zero in the intial model means it must be helping the R-squared value, even if it is only a very small improvement. So when we force the variable to be removed, it will decrease the R-squared a little bit. However, this small decrease is worth it to have a simpler model.

On the contrary, when we remove insignificant variables, the "Adjusted R-squred" will frequently be better. This value accounts for the complexity of the model, and thus tends to increase as insignificant variables are removed, and decrease as insignificant variables are added.

residuals (the difference between the predicted and actual values).