genFeatures.Rd
Function to generate a variety of feature types from TxDb
objects using
utilities provided by the GenomicFeatures
package. The feature types are
organized per gene and can be returned on that level in their non-reduced or
reduced form.
Currently, supported features include intergenic
, promoter
,
intron
, exon
, cds
, 5'/3'UTR
and different
transcript
types. The latter contains as many transcript types as
available in the tx_type
column when extracting transcripts from
TxDb
objects as follows:
transcripts(txdb, c("tx_name", "gene_id", "tx_type"))
genFeatures(txdb, featuretype = "all", reduce_ranges, upstream = 1000, downstream = 0, verbose = TRUE)
TxDb
object
Feature types can be specified by assigning a character
vector containing
any of the following: c("tx_type", "promoter", "intron", "exon", "cds", "fiveUTR", "threeUTR", "intergenic").
The default all
is a shorthand to select all supported features.
If set to TRUE
the feature ranges will be reduced on the gene level. As a result
overlapping feature components of the same type and from the same gene will be
merged to a single range, e.g. two overlapping exons from the same gene are merged
to one. Intergenic ranges are not affected by this setting. Note, all reduced feature
types are labeled with the suffix '_red'.
Defines for promoter features the number of bases upstream from the transcription start site.
Defines for promoter features the number of bases downstream from the transcription start site.
verbose=FALSE
turns off all print messages.
The results are returned as a GRangesList
where each component is a
GRanges
object containing the range set of each feature type. Intergenic
ranges are assigned unique identifiers and recorded in the featuretype_id
column of the metadata block. For this the ids of their adjacent genes are concatenated with two underscores as separator. If the adjacent genes overlap with other genes then their identifiers are included in the id string as well and separated by a single underscore.
transcripts
and associated TxDb
accessor functions from
the GenomicFeatures
package.
## Sample from GenomicFeatures package
library(GenomicFeatures)
#> Loading required package: AnnotationDbi
gffFile <- system.file("extdata", "GFF3_files", "a.gff3", package="GenomicFeatures")
txdb <- makeTxDbFromGFF(file=gffFile, format="gff3", organism="Solanum lycopersicum")
#> Import genomic features from the file as a GRanges object ...
#> OK
#> Prepare the 'metadata' data frame ...
#> OK
#> Make the TxDb object ...
#> Warning: The "phase" metadata column contains non-NA values for features of type
#> exon. This information was ignored.
#> OK
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=FALSE, upstream=1000, downstream=0)
#> Created feature ranges: mRNA
#> Created feature ranges: promoter
#> Created feature ranges: intron
#> Created feature ranges: exon
#> Created feature ranges: cds
#> Created feature ranges: fiveUTR
#> Created feature ranges: threeUTR
#> Created feature ranges: intergenic
## List extracted feature types
names(feat)
#> [1] "mRNA" "promoter" "intron" "exon" "cds"
#> [6] "fiveUTR" "threeUTR" "intergenic"
## Obtain feature lists by genes, here for promoter
split(feat$promoter, unlist(mcols(feat$promoter)$feature_by))
#> GRangesList object of length 488:
#> $Solyc00g005000.2
#> GRanges object with 1 range and 3 metadata columns:
#> seqnames ranges strand | feature_by
#> <Rle> <IRanges> <Rle> | <CharacterList>
#> Solyc00g005000.2.1 SL2.40ch00 15437-16436 + | Solyc00g005000.2
#> featuretype_id featuretype
#> <character> <character>
#> Solyc00g005000.2.1 Solyc00g005000.2.1 promoter
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> $Solyc00g005020.1
#> GRanges object with 1 range and 3 metadata columns:
#> seqnames ranges strand | feature_by
#> <Rle> <IRanges> <Rle> | <CharacterList>
#> Solyc00g005020.1.1 SL2.40ch00 67062-68061 + | Solyc00g005020.1
#> featuretype_id featuretype
#> <character> <character>
#> Solyc00g005020.1.1 Solyc00g005020.1.1 promoter
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> $Solyc00g005040.2
#> GRanges object with 1 range and 3 metadata columns:
#> seqnames ranges strand | feature_by
#> <Rle> <IRanges> <Rle> | <CharacterList>
#> Solyc00g005040.2.1 SL2.40ch00 549920-550919 + | Solyc00g005040.2
#> featuretype_id featuretype
#> <character> <character>
#> Solyc00g005040.2.1 Solyc00g005040.2.1 promoter
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> ...
#> <485 more elements>
## Return all features in single GRanges object
unlist(feat)
#> GRanges object with 4886 ranges and 3 metadata columns:
#> seqnames ranges strand | feature_by
#> <Rle> <IRanges> <Rle> | <CharacterList>
#> mRNA SL2.40ch00 16437-18189 + | Solyc00g005000.2
#> mRNA SL2.40ch00 68062-68764 + | Solyc00g005020.1
#> mRNA SL2.40ch00 550920-551576 + | Solyc00g005040.2
#> mRNA SL2.40ch00 1115784-1117712 + | Solyc00g005100.1
#> mRNA SL2.40ch00 1204073-1205342 + | Solyc00g005150.1
#> ... ... ... ... . ...
#> intergenic.479 SL2.40ch00 13647836-13649254 * | INTER00000479
#> intergenic.480 SL2.40ch00 13650638-13663835 * | INTER00000480
#> intergenic.481 SL2.40ch00 13666166-13763879 * | INTER00000481
#> intergenic.482 SL2.40ch00 13767947-13768659 * | INTER00000482
#> intergenic.483 SL2.40ch00 13769332-13784265 * | INTER00000483
#> featuretype_id featuretype
#> <character> <character>
#> mRNA Solyc00g005000.2.1 mRNA
#> mRNA Solyc00g005020.1.1 mRNA
#> mRNA Solyc00g005040.2.1 mRNA
#> mRNA Solyc00g005100.1.1 mRNA
#> mRNA Solyc00g005150.1.1 mRNA
#> ... ... ...
#> intergenic.479 Solyc00g050030.1__So.. intergenic
#> intergenic.480 Solyc00g050130.1__So.. intergenic
#> intergenic.481 Solyc00g050430.2__So.. intergenic
#> intergenic.482 Solyc00g052430.2_Sol.. intergenic
#> intergenic.483 Solyc00g052540.1__So.. intergenic
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
if (FALSE) {
## Sample from systemPipeRdata package
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0)
}