Function to generate a variety of feature types from TxDb objects using utilities provided by the GenomicFeatures package. The feature types are organized per gene and can be returned on that level in their non-reduced or reduced form.

Currently, supported features include intergenic, promoter, intron, exon, cds, 5'/3'UTR and different transcript types. The latter contains as many transcript types as available in the tx_type column when extracting transcripts from TxDb objects as follows: transcripts(txdb, c("tx_name", "gene_id", "tx_type"))

genFeatures(txdb, featuretype = "all", reduce_ranges, upstream = 1000, downstream = 0, verbose = TRUE)

Arguments

txdb

TxDb object

featuretype

Feature types can be specified by assigning a character vector containing any of the following: c("tx_type", "promoter", "intron", "exon", "cds", "fiveUTR", "threeUTR", "intergenic"). The default all is a shorthand to select all supported features.

reduce_ranges

If set to TRUE the feature ranges will be reduced on the gene level. As a result overlapping feature components of the same type and from the same gene will be merged to a single range, e.g. two overlapping exons from the same gene are merged to one. Intergenic ranges are not affected by this setting. Note, all reduced feature types are labeled with the suffix '_red'.

upstream

Defines for promoter features the number of bases upstream from the transcription start site.

downstream

Defines for promoter features the number of bases downstream from the transcription start site.

verbose

verbose=FALSE turns off all print messages.

Value

The results are returned as a GRangesList where each component is a GRanges object containing the range set of each feature type. Intergenic ranges are assigned unique identifiers and recorded in the featuretype_id

column of the metadata block. For this the ids of their adjacent genes are concatenated with two underscores as separator. If the adjacent genes overlap with other genes then their identifiers are included in the id string as well and separated by a single underscore.

Author

Thomas Girke

See also

transcripts and associated TxDb accessor functions from the GenomicFeatures package.

Examples

## Sample from GenomicFeatures package
library(GenomicFeatures)
#> Loading required package: AnnotationDbi
gffFile <- system.file("extdata", "GFF3_files", "a.gff3", package="GenomicFeatures")
txdb <- makeTxDbFromGFF(file=gffFile, format="gff3", organism="Solanum lycopersicum")
#> Import genomic features from the file as a GRanges object ... 
#> OK
#> Prepare the 'metadata' data frame ... 
#> OK
#> Make the TxDb object ... 
#> Warning: The "phase" metadata column contains non-NA values for features of type
#>   exon. This information was ignored.
#> OK
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=FALSE, upstream=1000, downstream=0)
#> Created feature ranges: mRNA 
#> Created feature ranges: promoter 
#> Created feature ranges: intron 
#> Created feature ranges: exon 
#> Created feature ranges: cds 
#> Created feature ranges: fiveUTR 
#> Created feature ranges: threeUTR 
#> Created feature ranges: intergenic 

## List extracted feature types
names(feat)
#> [1] "mRNA"       "promoter"   "intron"     "exon"       "cds"       
#> [6] "fiveUTR"    "threeUTR"   "intergenic"

## Obtain feature lists by genes, here for promoter
split(feat$promoter, unlist(mcols(feat$promoter)$feature_by))
#> GRangesList object of length 488:
#> $Solyc00g005000.2
#> GRanges object with 1 range and 3 metadata columns:
#>                        seqnames      ranges strand |       feature_by
#>                           <Rle>   <IRanges>  <Rle> |  <CharacterList>
#>   Solyc00g005000.2.1 SL2.40ch00 15437-16436      + | Solyc00g005000.2
#>                          featuretype_id featuretype
#>                             <character> <character>
#>   Solyc00g005000.2.1 Solyc00g005000.2.1    promoter
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> $Solyc00g005020.1
#> GRanges object with 1 range and 3 metadata columns:
#>                        seqnames      ranges strand |       feature_by
#>                           <Rle>   <IRanges>  <Rle> |  <CharacterList>
#>   Solyc00g005020.1.1 SL2.40ch00 67062-68061      + | Solyc00g005020.1
#>                          featuretype_id featuretype
#>                             <character> <character>
#>   Solyc00g005020.1.1 Solyc00g005020.1.1    promoter
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> $Solyc00g005040.2
#> GRanges object with 1 range and 3 metadata columns:
#>                        seqnames        ranges strand |       feature_by
#>                           <Rle>     <IRanges>  <Rle> |  <CharacterList>
#>   Solyc00g005040.2.1 SL2.40ch00 549920-550919      + | Solyc00g005040.2
#>                          featuretype_id featuretype
#>                             <character> <character>
#>   Solyc00g005040.2.1 Solyc00g005040.2.1    promoter
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
#> 
#> ...
#> <485 more elements>

## Return all features in single GRanges object
unlist(feat)
#> GRanges object with 4886 ranges and 3 metadata columns:
#>                    seqnames            ranges strand |       feature_by
#>                       <Rle>         <IRanges>  <Rle> |  <CharacterList>
#>             mRNA SL2.40ch00       16437-18189      + | Solyc00g005000.2
#>             mRNA SL2.40ch00       68062-68764      + | Solyc00g005020.1
#>             mRNA SL2.40ch00     550920-551576      + | Solyc00g005040.2
#>             mRNA SL2.40ch00   1115784-1117712      + | Solyc00g005100.1
#>             mRNA SL2.40ch00   1204073-1205342      + | Solyc00g005150.1
#>              ...        ...               ...    ... .              ...
#>   intergenic.479 SL2.40ch00 13647836-13649254      * |    INTER00000479
#>   intergenic.480 SL2.40ch00 13650638-13663835      * |    INTER00000480
#>   intergenic.481 SL2.40ch00 13666166-13763879      * |    INTER00000481
#>   intergenic.482 SL2.40ch00 13767947-13768659      * |    INTER00000482
#>   intergenic.483 SL2.40ch00 13769332-13784265      * |    INTER00000483
#>                          featuretype_id featuretype
#>                             <character> <character>
#>             mRNA     Solyc00g005000.2.1        mRNA
#>             mRNA     Solyc00g005020.1.1        mRNA
#>             mRNA     Solyc00g005040.2.1        mRNA
#>             mRNA     Solyc00g005100.1.1        mRNA
#>             mRNA     Solyc00g005150.1.1        mRNA
#>              ...                    ...         ...
#>   intergenic.479 Solyc00g050030.1__So..  intergenic
#>   intergenic.480 Solyc00g050130.1__So..  intergenic
#>   intergenic.481 Solyc00g050430.2__So..  intergenic
#>   intergenic.482 Solyc00g052430.2_Sol..  intergenic
#>   intergenic.483 Solyc00g052540.1__So..  intergenic
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

if (FALSE) {
## Sample from systemPipeRdata package
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
feat <- genFeatures(txdb, featuretype="all", reduce_ranges=TRUE, upstream=1000, downstream=0)
}