### R Chi Square Test Example

`chisq.test()`

function performs chi squared contingency table tests and goodness of fit tests.chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)

•

`x`

: a numeric vector or matrix.•

`y`

: a numeric vector or a factor (if x is a factor of same length) or NULL (if x is a matrix).•

`correct`

: a logical indicating whether to apply continuity correction when computing the test statistic for 2 by 2 tables:
one half is subtracted from all |O - E| differences. No correction is done if simulate.p.value = TRUE. •

`p`

: a vector of probabilities of the same length of x. An error is given if any entry of p is negative. •

`rescale.p`

: a logical scalar; if TRUE then p is rescaled (if necessary) to sum to 1. If rescale.p is FALSE, and p does not sum to 1, an error is given. •

`simulate.p.value`

: a logical indicating whether to compute p-values by Monte Carlo simulation. •

`B`

: an integer specifying the number of replicates used in the Monte Carlo test. For Example, there are 205 mutations in gene p53 of 514 tumors, while 96 stage IV tumors have 86 mutations. We expect that 96 stage IV tumors should have 96 x 205 / 514 = 38 mutations, while we observed 86. Is that significantly different from the general mutation pattern?

The R source code for a chi square goodness of fit test is:

> sam <- matrix(c(86,96,38,96),nrow=2,ncol=2) > sam

[,1] [,2] [1,] 86 38 [2,] 96 96

> chisq.test(sam)

Pearson's Chi-squared test with Yates' continuity correction data: sam X-squared = 10.7773, df = 1, p-value = 0.001028

> chisq.test(sam)$p.value

[1] 0.001027552

Following is a csv file example.

Following R code can do chi square test of every line in the example file:

x<-read.csv("chisq.csv",header=T,sep=",",dec=".") zz <- file("out_chisq.txt","w") title <- names(x) writeLines(paste(title[1],title[2],title[3],title[4],title[5], "Chisq P Value",sep=","),con=zz,sep="\n") xR <- nrow(x) sam<-array(dim=c(2,2)) for (i in 1:xR) { sam[1,] <- c(x[i,2],x[i,3]) sam[2,] <- c(x[i,4],x[i,5]) pv<- chisq.test(sam)$p.value writeLines(paste(x[i,1],x[i,2],x[i,3],x[i,4],x[i,5],pv,sep=","), con=zz,sep="\n") } close(zz)

The content of the output file is:

Gene,Unique.observed,Unique.expected,duplicated.observed, duplicate.expected,Chisq P Value TTN,27,33,60,54,0.425175749168081 GATA3,38,20,17,35,0.00116789922038592 HLA-DRB6,18,15,24,27,0.655008761576397 MUC16,13,15,28,26,0.815855072976336 NR1H2,11,15,29,25,0.473920420172139 GPRIN2,12,14,27,25,0.810181236410474 MAP3K1,15,14,24,25,1 GPRIN1,13,14,25,24,1 MLL3,12,14,26,24,0.808944275014528 MAP3K4,8,14,29,23,0.203492032204285 CDH1,17,12,17,22,0.326688384050414 ENSG00000245549,15,12,18,21,0.616574005797083 ZNF384,12,12,20,20,0.796253414737639 FRG1B,11,11,20,20,0.790676108831151 AKD1,9,11,21,19,0.784191229401619 OBSCN,12,11,17,18,1 NCOA3,8,10,20,18,0.77477725929156 USH2A,8,10,20,18,0.77477725929156 ENSG00000198786,12,10,15,17,0.781814003488769

Download the csv file and the R source code:

__Data File__

__R Source Code File__