To test a sample population of genes for over-representation of GO terms, the
GOHyperGAll computes for all nodes in the three GO networks
(BP, CC and MF) an enrichment test based on the hypergeometric distribution and
returns the corresponding raw and Bonferroni corrected p-values.
Subsequently, a filter function supports GO Slim analyses using default or
custom GO Slim categories. Several convenience functions are provided to process
large numbers of gene sets (e.g. clusters from partitioning results) and to
visualize the results.
GOHyperGAll provides similar utilities as the
function in the
GOstats package. The main difference is that
GOHyperGAll simplifies processing of large numbers of gene sets, as well
as the usage of custom array-to-gene and gene-to-GO mappings.
## Generate gene-to-GO mappings and store as catDB object makeCATdb(myfile, lib = NULL, org = "", colno = c(1, 2, 3), idconv = NULL, rootUK=FALSE) ## Enrichment function GOHyperGAll(catdb, gocat = "MF", sample, Nannot = 2) ## GO slim analysis GOHyperGAll_Subset(catdb, GOHyperGAll_result, sample = test_sample, type = "goSlim", myslimv) ## Reduce GO term redundancy GOHyperGAll_Simplify(GOHyperGAll_result, gocat = "MF", cutoff = 0.001, correct = TRUE) ## Batch analysis of many gene sets GOCluster_Report(catdb, setlist, id_type = "affy", method = "all", CLSZ = 10, cutoff = 0.001, gocats = c("MF", "BP", "CC"), myslimv = "default", correct = TRUE, recordSpecGO = NULL, ...) ## Bar plot of GOCluster_Report results goBarplot(GOBatchResult, gocat)
File with gene-to-GO mappings. Sample files can be downloaded from geneontology.org (http://geneontology.org/GO.downloads.annotations.shtml) or from BioMart as shown in example below.
Column numbers referencing in
myfile the three target columns containing GOID, GeneID and GOCAT, in that order.
Optional argument. Currently, the only valid option is
org="Arabidopsis" to get rid of transcript duplications in this particular annotation.
If the gene-to-GO mappings are obtained from a
*.db package from Bioconductor then the package name can be specified under the
lib argument of the
Optional id conversion
catdb object storing mappings of genes to annotation categories. For details, see ?"SYSargs-class".
If the argument
rootUK is set to
TRUE then the root nodes are treated as terminal nodes to account for the new unknown terms.
character vector containing the test set of gene identifiers
Defines the minimum number of direct annotations per GO node from the sample set to determine the number of tested hypotheses for the p-value adjustment.
Specifies the GO type, can be assigned one of the following character values: "MF", "BP" and "CC".
data.frame generated by
GOHyperGAll_Subset subsets the
results by directly assigned GO nodes or custom
type can be assigned the values
optional argument to provide custom
p-value cutoff for GO terms to show in result
TRUE the function will favor the selection of terminal (informationich)
GO terms that have at the same time a large number of sample matches.
character vectors containing gene IDs (or array feature
IDs). The names of the
list components correspond to the set labels,
e.g. DEG comparisons or cluster IDs.
specifies type of IDs in input, can be assigned
Specifies analysis type. Current options are
minimum gene set (cluster) size to consider. Gene sets below this cutoff will be ignored.
Specifies GO type, can be assigned the values "MF", "BP" and "CC".
argument to report in the result
data.frame specific GO IDs for any
of the 3 ontologies disregarding whether they meet the specified p-value
recordSpecGO=c("GO:0003674", "GO:0008150", "GO:0005575")
data.frame generated by
additional arguments to pass on
GOHyperGAll_Simplify: The result data frame from
will often contain several connected GO terms with significant scores which
can complicate the interpretation of large sample sets. To reduce this redundancy,
GOHyperGAll_Simplify subsets the data frame
by a user specified p-value cutoff and removes from it all GO nodes with
overlapping children sets (OFFSPRING), while the best scoring nodes are
retained in the result
GOCluster_Report: performs the three types of GO term enrichment
analyses in batch mode:
GOHyperGAll_Simplify. It processes many gene sets (e.g. gene expression
clusters) and returns the results conveniently organized in a single result data frame.
catDB object from file.
This workflow has been published in Plant Physiol (2008) 147, 41-57.