cool_seq_tool.mappers.mane_transcript#
Retrieve MANE transcript from a location on p./c./g. coordinates.
Steps:
Map annotation layer to genome
Liftover to preferred genome (GRCh38). GRCh36 and earlier assemblies are not supported for fetching MANE transcripts.
Select preferred compatible annotation (see transcript compatibility)
Map back to correct annotation layer
In addition to a mapper utility class, this module also defines several vocabulary constraints and data models for coordinate representation.
- class cool_seq_tool.mappers.mane_transcript.CdnaRepresentation(**data)[source]#
Define object model for coding DNA representation
- class cool_seq_tool.mappers.mane_transcript.DataRepresentation(**data)[source]#
Define object model for final output representation
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}[source]#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
status:
TranscriptPriority[source]#
- class cool_seq_tool.mappers.mane_transcript.EndAnnotationLayer(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Define constraints for end annotation layer. This is used for determining the end annotation layer when getting the longest compatible remaining representation
- class cool_seq_tool.mappers.mane_transcript.GenomicRepresentation(**data)[source]#
Define object model for genomic representation
-
mane_genes:
list[ManeGeneData][source]#
-
mane_genes:
- class cool_seq_tool.mappers.mane_transcript.ManeTranscript(seqrepo_access, transcript_mappings, mane_transcript_mappings, uta_db, liftover)[source]#
Class for retrieving MANE transcripts.
- __init__(seqrepo_access, transcript_mappings, mane_transcript_mappings, uta_db, liftover)[source]#
Initialize the ManeTranscript class.
A handful of resources are required for initialization, so when defaults are enough, it’s easiest to let the core CoolSeqTool class handle it for you:
>>> from cool_seq_tool.app import CoolSeqTool >>> mane_mapper = CoolSeqTool().mane_transcript
Note that most methods are defined as Python coroutines, so they must be called with
awaitor run from anasyncevent loop:>>> import asyncio >>> result = asyncio.run(mane_mapper.g_to_grch38("NC_000001.11", 100, 200)) >>> result.ac 'NC_000001.11'
See the Usage section for more information.
- Parameters:
seqrepo_access (
SeqRepoAccess) – Access to seqrepo queriestranscript_mappings (
TranscriptMappings) – Access to transcript accession mappings and conversionsmane_transcript_mappings (
ManeTranscriptMappings) – Access to MANE Transcript accession mapping datauta_db (
UtaDatabase) – UtaDatabase instance to give access to query UTA databaseliftover (
LiftOver) – Instance to provide mapping between human genome assemblies
- async g_to_grch38(ac, start_pos, end_pos, get_mane_genes=False, residue_mode=ResidueMode.RESIDUE)[source]#
Return genomic coordinate on GRCh38 when not given gene context.
- Parameters:
ac (
str) – Genomic accessionstart_pos (
int) – Genomic start positionend_pos (
int) – Genomic end positionget_mane_genes (
bool) –Trueif mane genes for genomic position should be included in response.False, otherwise.residue_mode (
ResidueMode) – Residue mode forstart_posandend_pos
- Return type:
Optional[GenomicRepresentation]- Returns:
GRCh38 genomic representation (accession and start/end inter-residue position)
- async g_to_mane_c(ac, start_pos, end_pos, gene, residue_mode=ResidueMode.RESIDUE)[source]#
Return MANE Transcript on the c. coordinate.
>>> import asyncio >>> from cool_seq_tool.app import CoolSeqTool >>> cst = CoolSeqTool() >>> result = asyncio.run( ... cst.mane_transcript.g_to_mane_c( ... "NC_000007.13", 55259515, None, gene="EGFR" ... ) ... ) >>> type(result) <class 'cool_seq_tool.mappers.mane_transcript.CdnaRepresentation'> >>> result.status <TranscriptPriority.MANE_SELECT: 'mane_select'> >>> del cst
- Parameters:
ac (
str) – Transcript accession on g. coordinatestart_pos (
int) – genomic start positionend_pos (
int) – genomic end positiongene (
str) – HGNC gene symbolresidue_mode (
ResidueMode) – Starting residue mode forstart_posandend_pos. Will always return coordinates in inter-residue.
- Return type:
Optional[CdnaRepresentation]- Returns:
MANE Transcripts with cDNA change on c. coordinate
- async get_longest_compatible_transcript(start_pos, end_pos, start_annotation_layer, gene=None, ref=None, residue_mode=ResidueMode.RESIDUE, mane_transcripts=None, alt_ac=None, end_annotation_layer=None)[source]#
Get longest compatible transcript from a gene. See the documentation for the transcript compatibility policy for more information.
>>> import asyncio >>> from cool_seq_tool.app import CoolSeqTool >>> from cool_seq_tool.schemas import AnnotationLayer, ResidueMode >>> mane_mapper = CoolSeqTool().mane_transcript >>> mane_transcripts = { ... "ENST00000646891.2", ... "NM_001374258.1", ... "NM_004333.6", ... "ENST00000644969.2", ... } >>> result = asyncio.run( ... mane_mapper.get_longest_compatible_transcript( ... 599, ... 599, ... gene="BRAF", ... start_annotation_layer=AnnotationLayer.PROTEIN, ... residue_mode=ResidueMode.INTER_RESIDUE, ... mane_transcripts=mane_transcripts, ... ) ... ) >>> result.refseq 'NP_001365396.1'
If unable to find a match on GRCh38, this method will then attempt to drop down to GRCh37.
# TODO example for inputs that demonstrate this?
- Parameters:
start_pos (
int) – Start position changeend_pos (
int) – End position changestart_annotation_layer (
AnnotationLayer) – Starting annotation layergene (
Optional[str]) – HGNC gene symbolref (
Optional[str]) – Reference at position given during inputresidue_mode (
ResidueMode) – Residue mode forstart_posandend_posmane_transcripts (
Optional[set]) – Attempted mane transcripts that were not compatiblealt_ac (
Optional[str]) – Genomic accessionend_annotation_layer (
Optional[EndAnnotationLayer]) – The end annotation layer. If not provided, will be set toEndAnnotationLayer.PROTEINifstart_annotation_layer == AnnotationLayer.PROTEIN,EndAnnotationLayer.CDNAotherwise
- Return type:
UnionType[DataRepresentation,CdnaRepresentation,ProteinAndCdnaRepresentation,None]- Returns:
Data for longest compatible transcript
- static get_mane_c_pos_change(mane_tx_genomic_data, coding_start_site)[source]#
Get mane c position change
- Parameters:
mane_tx_genomic_data (
dict) – MANE transcript and genomic datacoding_start_site (
int) – Coding start site
- Return type:
tuple[int,int]- Returns:
cDNA pos start, cDNA pos end
- async get_mane_transcript(ac, start_pos, end_pos, start_annotation_layer, gene=None, ref=None, try_longest_compatible=False, residue_mode=ResidueMode.RESIDUE)[source]#
Return MANE representation
- If
start_annotation_layerisAnnotationLayer.PROTEIN, will return AnnotationLayer.PROTEINrepresentation.- If
start_annotation_layerisAnnotationLayer.CDNA, will return AnnotationLayer.CDNArepresentation.- If
start_annotation_layerisAnnotationLayer.GENOMICwill return AnnotationLayer.CDNArepresentation ifgeneis provided andAnnotationLayer.GENOMICGRCh38 representation ifgeneis NOT provided.
>>> from cool_seq_tool.app import CoolSeqTool >>> from cool_seq_tool.schemas import AnnotationLayer, ResidueMode >>> import asyncio >>> mane_mapper = CoolSeqTool().mane_transcript >>> result = asyncio.run( ... mane_mapper.get_mane_transcript( ... "NP_004324.2", ... 599, ... AnnotationLayer.PROTEIN, ... residue_mode=ResidueMode.INTER_RESIDUE, ... ) ... ) >>> result.gene, result.refseq, result.status ('BRAF', 'NP_004324.2', <TranscriptPriority.MANE_SELECT: 'mane_select'>)
- Parameters:
ac (
str) – Accessionstart_pos (
int) – Start position changeend_pos (
int) – End position changestart_annotation_layer (
AnnotationLayer) – Starting annotation layer.gene (
Optional[str]) – HGNC gene symbol. Ifgeneis not provided andstart_annotation_layerisAnnotationLayer.GENOMIC, will return GRCh38 representation. Ifgeneis provided andstart_annotation_layerisAnnotationLayer.GENOMIC, will return cDNA representation.ref (
Optional[str]) – Reference at position given during inputtry_longest_compatible (
bool) –Trueif should try longest compatible remaining if mane transcript was not compatible.Falseotherwise.residue_mode (ResidueMode) – Starting residue mode for
start_posandend_pos. Will always return coordinates in inter-residue
- Return type:
UnionType[DataRepresentation,CdnaRepresentation,None]- Returns:
MANE data or longest transcript compatible data if validation checks are correct. Will return inter-residue coordinates. Else,
None.
- If
- static get_reading_frame(pos)[source]#
Return reading frame number. Only used on c. coordinate.
- Parameters:
pos (
int) – cDNA position- Return type:
int- Returns:
Reading frame
- async grch38_to_mane_c_p(alt_ac, start_pos, end_pos, gene=None, residue_mode=ResidueMode.RESIDUE, try_longest_compatible=False)[source]#
Given GRCh38 genomic representation, return protein representation.
Will try MANE Select and then MANE Plus Clinical. If neither is found and
try_longest_compatibleis set totrue, will also try to find the longest compatible remaining representation.- Parameters:
alt_ac (
str) – Genomic RefSeq accession on GRCh38start_pos (
int) – Start positionend_pos (
int) – End positiongene (
Optional[str]) – HGNC gene symbolresidue_mode (
ResidueMode) – Starting residue mode forstart_posandend_pos. Will always return coordinates as inter-residue.try_longest_compatible (
bool) –Trueif should try longest compatible remaining if mane transcript(s) not compatible.Falseotherwise.
- Return type:
Optional[dict]- Returns:
If successful, return MANE data or longest compatible remaining (if
try_longest_compatibleset toTrue) cDNA and protein representation. Will return inter-residue coordinates.
- validate_index(ac, pos, coding_start_site)[source]#
Validate that positions actually exist on accession
- Parameters:
ac (
str) – Accessionpos (
tuple[int,int]) – Start position change, End position changecoding_start_site (
int) – coding start site for accession
- Return type:
bool- Returns:
Trueif positions exist on accession.Falseotherwise
- class cool_seq_tool.mappers.mane_transcript.ProteinAndCdnaRepresentation(**data)[source]#
Define object model for protein and cDNA representation
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}[source]#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
protein:
DataRepresentation[source]#