dataminer: October 2013

Sunday, October 27, 2013

Nice tutorials to discover R

Nice tutorials to discover R http://t.co/ckBJskmpvK via @rbloggers
— Dilir Akhtar Khan (@dilirkhan) October 27, 2013

Normalize Data in R (Calculate Z scores)

scale() function is used to create Z scores (normalize) in R.

To calculate Z score of a variable, we subtract the mean of all data points from each individual data point and divide the result by standard deviation of the variable. scale() does this in one simple call.

In R console, type

> x = c(2,4,6,8)

This creates a variable x.

To subtract the mean of the variable from each data point (this is called centering):

> scale(x, center = TRUE, scale = TRUE) # scale = FALSE will not divide each data point by mean

> x

[,1]
[1,] -1.1618950
[2,] -0.3872983
[3,] 0.3872983
[4,] 1.1618950
attr(,"scaled:center")
[1] 5
attr(,"scaled:scale")
[1] 2.581989

Tuesday, October 22, 2013

T-Test in R

98.6 t-test.xlsx the file needs to be converted to .csv

http://ww2.coastal.edu/kingw/statistics/R-tutorials/singlesample.html

normtmp=read.csv(“e:/r/98.6 t-test.csv”,header=TRUE)
qqnorm(normtmp$tmp)
qqline(normtmp$tmp)
plot(density(normtmp$tmp))
shapiro.test(normtmp$tmp)
t.test(normtmp$tmp, mu=98.6, conf.level=.99, alternative=”two.sided”)
# output not shown

#Note: setting the alternative to “two.sided” was unnecessary, since that is the default.
We can now reject the null at any reasonable alpha level we might have chosen!
#From the sample, we might estimate the mean human body temperature to be 98.25 degrees (sample mean on the last line of output).
#A 99% CI lets us be 99% sure the population mean is between 98.08111 and 98.41735 degrees.

Friday, October 11, 2013

Different Types of Plots in R

To get the data set click this link : Friends Data from Carnegie Mellon University. data will be Data will be downloaded on your computer. Double click the downloaded file. A new session of R will start and data will be loaded in a variable named: friends.

To take a look at the data, type:
> friends

Create a table:
> t <- table(friends)

see the table:

> t

friends
No difference Opposite sex Same sex
602 434 164

> barplot(t)

Output:

> barplot(t, horiz=T)

Try
> barplot(t, horiz=T, main="Friends Distribution", ylab="Make Friends With", col="darkblue")

For more examples, check: http://www.statmethods.net/graphs/bar.html

Pie Chart
------------
> pie(t)

To create 3D pie chart:

> install.packages("plotrix")

>library(plotrix)

>pie3D(t, explode=.1)

Saturday, October 5, 2013

Chi Square Test

Copy the following data in a text editor, add a blank line at the end and save as chisq.csv.

Heart Rate Increased, No Heart Rate Increase
Treated, 36,14
Not Treated, 30, 25

For details on the data,visit http://math.hws.edu/javamath/ryan/ChiSquare.html

What we are trying to do here is to test the effect of a drug.
Ho: The proportion of animals whose heart rate increased is independent of drug treatment.
Ha: The proportion of animals whose heart rate increased is associated with drug treatment.

Read the data into R:
> x <- read.csv("e:/r/chisq.csv")

If you didn't enter a line at the end of the file, you are likely to get the following warning:

Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'Chi_Square.csv'

However, lets run the test:

> chisq.test(x, correct=F)

Output:

Pearson's Chi-squared test

data: x

X-squared = 3.4177, df = 1, p-value = 0.0645

Look at the p-value.

p-value of 0.065 is greater than the conventionally accepted of p > 0.05 we fail to reject the null hypothesis. In other words, there is no statistically significant difference in the proportion of animals whose heart rate increased.

Friday, October 4, 2013

Notes

discrete data arise from a counting process, while continuous data arise from a measuring process.

Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.

Wednesday, October 2, 2013

R Video Link

http://www.twotorials.com/