R Read.table Example -- EndMemo

Home
»
R
»
read.table()

R read.table()

read.table() function reads a file into data frame in table format. The file can be comma delimited or tab or any other delimiter specified by parameter "sep=". If the parameter "header=" is "TRUE", then the first row will be treated as the column names.

read.table(file, header = FALSE, sep = "", quote = "\"'",
dec = ".", row.names, col.names,
as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = !blank.lines.skip,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "", encoding = "unknown", text)

• file: file name
• header: 1st line as header or not, logical
• sep: field separator
• quote: quoting characters
• dec: the character used in the file for decimal points.
• row.names: a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names. If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are numbered. Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be ‘automatic’ (and not preserved by as.matrix).
• col.names: a vector of optional names for the variables. The default is to use "V" followed by the column number.
• as.is: the default behavior of read.table is to convert character variables (which are not converted to logical, numeric or complex) to factors. The variable as.is controls the conversion of columns not otherwise specified by colClasses. Its value is either a vector of logicals (values are recycled if necessary), or a vector of numeric or character indices which specify which columns should not be converted to factors. Note: to suppress all conversions including those of numeric columns, set colClasses = "character". Note that as.is is specified per column (not per variable) and so includes the column of row names (if any) and any columns to be skipped.
• na.strings: a character vector of strings which are to be interpreted as NA values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.
• colClasses: character. A vector of classes to be assumed for the columns. Recycled as necessary, or if the character vector is named, unspecified values are taken to be NA. Possible values are NA (the default, when type.convert is used), "NULL" (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor", "Date" or "POSIXct". Otherwise there needs to be an as method (from package methods) for conversion from "character" to the specified formal class. Note that colClasses is specified per column (not per variable) and so includes the column of row names (if any).
• nrows: integer: the maximum number of rows to read in. Negative and other invalid values are ignored.
• skip: integer: the number of lines of the data file to skip before beginning to read data.
• check.names: logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names) so that they are, and also to ensure that there are no duplicates.
• fill: logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See ‘Details’.
• strip.white: logical. Used only when sep has been specified, and allows the stripping of leading and trailing white space from unquoted character fields (numeric fields are always stripped). See scan for further details (including the exact meaning of ‘white space’), remembering that the columns may include the row names.
• blank.lines.skip: logical: if TRUE blank lines in the input are ignored.
• comment.char: character: a character vector of length one containing a single character or an empty string. Use "" to turn off the interpretation of comments altogether.
• allowEscapes: logical. Should C-style escapes such as \n be processed or read verbatim (the default)? Note that if not within quotes these could be interpreted as a delimiter (but not as a comment character). For more details see scan.
• flush: logical: if TRUE, scan will flush to the end of the line after reading the last of the fields requested. This allows putting comments after the last field.
• stringsAsFactors: logical: should character vectors be converted to factors? Note that this is overridden by as.is and colClasses, both of which allow finer control.
• fileEncoding: character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See the ‘Encoding’ section of the help for file, the ‘R Data Import/Export Manual’ and ‘Note’.
• encoding: encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8 (see Encoding): it is not used to re-encode the input, but allows R to handle encoded strings in their native encoding (if one of those two). See ‘Value’.
• text: character string: if file is not supplied and this is, then data are read from the value of text via a text connection. Notice that a literal string can be used to include (small) data sets within R code.

Following is a csv file example "tp.csv

> x <- read.table("tp.csv",header=T,sep="\t");
> x
         t1   t2 t3 t4   t5   t6 t7   t8
r1        1    0  1  0 <NA>    1  0    2
r2     1    2  2  1    2    1  2    1
r3    0    0  0  2    1    1  0    1
r4     <NA>    0  1  1    2 <NA>  0    0
r5        0    2  1  1    1    0  0    0
r6        2    2  0  1    1    1  0    0
r7        2    2  0  1    1    1  0    1
r8    0    2  1  0    1    1  2    0
r9        1 <NA>  1  2 <NA>    1  0    1
r10       1    0  2  1    2    2  1 <NA>
> is.data.frame(x)

[1] TRUE

If strip.white=T, it allows the stripping of leading and trailing white space from unquoted character fields.

> x <- read.table("tp.csv",header=T,sep="\t",strip.white=T);
> x
      t1   t2 t3 t4   t5   t6 t7   t8
r1     1    0  1  0 <NA>    1  0    2
r2     1    2  2  1    2    1  2    1
r3     0    0  0  2    1    1  0    1
r4  <NA>    0  1  1    2 <NA>  0    0
r5     0    2  1  1    1    0  0    0
r6     2    2  0  1    1    1  0    0
r7     2    2  0  1    1    1  0    1
r8     0    2  1  0    1    1  2    0
r9     1 <NA>  1  2 <NA>    1  0    1
r10    1    0  2  1    2    2  1 <NA>

> ncol(x)

[1] 9

> nrow(x)

[1] 20

> rownames(x)

[1] "r1"  "r2"  "r3"  "r4"  "r5"  "r6"  "r7"  "r8"  "r9"  "r10"

skip = n argument will skip n lines from the 1st line.

> y <- read.table("tp.csv",skip=6,header=T,sep="\t",strip.white=T)
> y
      r6 X2 X2.1 X0 X1 X1.1 X1.2 X0.1 X0.2
1     r7  2    2  0  1    1    1    0    1
2     r8  0    2  1  0    1    1    2    0
3     r9  1 <NA>  1  2 <NA>    1    0    1
4    r10  1    0  2  1    2    2    1 <NA>

In order to keep the correct header, need some extra works:

> h <- read.table("tp.csv",nrows=1,header=F,sep="\t")
> y2 <- y[,2:ncol(y)]
> colnames(y2) <- unlist(h)
> rownames(y2) <- y[,1]
> y2
    t1   t2 t3 t4   t5 t6 t7   t8
r7   2    2  0  1    1  1  0    1
r8   0    2  1  0    1  1  2    0
r9   1 <NA>  1  2 <NA>  1  0    1
r10  1    0  2  1    2  2  1 <NA>
> x[7:nrow(x),]  # will get the same result

read.table() can also read from the clipboard. Let's select and copy some data from the Excel:

> z <- read.table(file="clipboard", sep="\t", header=FALSE)
> colnames(z) <- c("city","latitude","longitude")
> z
                   city   latitude longitude
1   Fernando De Noronha  -3.844965 -32.40944
2  Governador Valadares -18.849852 -41.94927
3         Iguassu Falls -25.252089 -52.02154
4          Jacareacanga  -6.206604 -57.82447
5     Juazeiro Do Norte  -7.237234 -39.32218
6         Monte Dourado  -0.866667 -52.51667
7   Presidente Prudente -22.127193 -51.38517
8             Rio Verde -17.789523 -50.92043
9             Salvadore -12.970382 -38.51238
10             Sau Luiz  -2.530731 -44.30683
11            Trombetas  -1.488766 -56.39394

R Tutorials

Data Type and Structures

Loop, Condition Statements

Plotting and Graphics

Selected Functions List