Tuesday, February 25, 2014

Convert Ensembl, Unigene, Uniprot and RefSeq IDs to Symbol IDs in R using Bioconductor

Hello, I have programmed a function that converts different sources of IDs to Symbol IDs.

The input ID types allowed are (at the moment):  Ensembl, Unigene, Uniprot and RefSeq.

The code is available clicking here

NOTE: The function depends on the Bioconductor package "org.Hs.eg.db" available here

For example, lets show 10 Ensembl IDs:

> id[1:10]
 [1] "ENSG00000121410" "ENSG00000175899" "ENSG00000256069" "ENSG00000171428"
 [5] "ENSG00000156006" "ENSG00000196136" "ENSG00000114771" "ENSG00000127837"
 [9] "ENSG00000129673" "ENSG00000090861"

And their Symbol IDs:

> res[1:10]
 [1] "A1BG"     "A2M"      "A2MP1"    "NAT1"     "NAT2"     "SERPINA3"
 [7] "AADAC"    "AAMP"     "AANAT"    "AARS"    

This is a running example of the function to convert Unigene IDs to Symbol IDs (For all the other IDs types, just replace "unigene" to "ensembl" or "refseq" or "uniprot"):

# USAGE EXAMPlE: UNIGENE
require(org.Hs.eg.db)
unigene <- toTable(org.Hs.egUNIGENE)
# extract 100 random unigene entries
id  <- unigene[sample(1:length(unigene[,2]),100),2]
id.type  <- "unigene"
res <- get.symbolIDs(id,id.type)

Benjamin