cool_seq_tool.mappers.mane_transcript#
Retrieve MANE transcript from a location on p./c./g. coordinates.
Steps:
Map annotation layer to genome
Liftover to preferred genome (GRCh38). GRCh36 and earlier assemblies are not supported for fetching MANE transcripts.
Select preferred compatible annotation (see transcript compatibility)
Map back to correct annotation layer
In addition to a mapper utility class, this module also defines several vocabulary constraints and data models for coordinate representation.
- class cool_seq_tool.mappers.mane_transcript.CdnaRepresentation(**data)[source]#
Define object model for coding DNA representation
- class cool_seq_tool.mappers.mane_transcript.DataRepresentation(**data)[source]#
Define object model for final output representation
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}[source]#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
status:
TranscriptPriority
[source]#
- class cool_seq_tool.mappers.mane_transcript.EndAnnotationLayer(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Define constraints for end annotation layer. This is used for determining the end annotation layer when getting the longest compatible remaining representation
- class cool_seq_tool.mappers.mane_transcript.GenomicRepresentation(**data)[source]#
Define object model for genomic representation
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}[source]#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
status:
TranscriptPriority
[source]#
- class cool_seq_tool.mappers.mane_transcript.ManeTranscript(seqrepo_access, transcript_mappings, mane_transcript_mappings, uta_db)[source]#
Class for retrieving MANE transcripts.
- __init__(seqrepo_access, transcript_mappings, mane_transcript_mappings, uta_db)[source]#
Initialize the ManeTranscript class.
A handful of resources are required for initialization, so when defaults are enough, it’s easiest to let the core CoolSeqTool class handle it for you:
>>> from cool_seq_tool.app import CoolSeqTool >>> mane_mapper = CoolSeqTool().mane_transcript
Note that most methods are defined as Python coroutines, so they must be called with
await
or run from anasync
event loop:>>> import asyncio >>> result = asyncio.run(mane_mapper.g_to_grch38("NC_000001.11", 100, 200)) >>> result['ac'] 'NC_000001.11'
See the Usage section for more information.
- Parameters:
seqrepo_access (
SeqRepoAccess
) – Access to seqrepo queriestranscript_mappings (
TranscriptMappings
) – Access to transcript accession mappings and conversionsmane_transcript_mappings (
ManeTranscriptMappings
) – Access to MANE Transcript accession mapping datauta_db (
UtaDatabase
) – UtaDatabase instance to give access to query UTA database
- async g_to_grch38(ac, start_pos, end_pos)[source]#
Return genomic coordinate on GRCh38 when not given gene context.
- Parameters:
ac (
str
) – Genomic accessionstart_pos (
int
) – Genomic start positionend_pos (
int
) – Genomic end position
- Return type:
Optional
[Dict
]- Returns:
NC accession, start and end pos on GRCh38 assembly
- async g_to_mane_c(ac, start_pos, end_pos, gene=None, residue_mode=ResidueMode.RESIDUE)[source]#
Return MANE Transcript on the c. coordinate.
If an arg for
gene
is provided, lifts to GRCh38, then gets MANE cDNA representation.>>> import asyncio >>> from cool_seq_tool.app import CoolSeqTool >>> cst = CoolSeqTool() >>> result = asyncio.run(cst.mane_transcript.g_to_mane_c( ... "NC_000007.13", ... 55259515, ... None, ... gene="EGFR" ... )) >>> type(result) <class 'cool_seq_tool.mappers.mane_transcript.CdnaRepresentation'> >>> result.status <TranscriptPriority.MANE_SELECT: 'mane_select'> >>> del cst
Locating a MANE transcript requires a
gene
symbol argument – if none is given, this method will only lift over to genomic coordinates on GRCh38.- Parameters:
ac (
str
) – Transcript accession on g. coordinatestart_pos (
int
) – genomic start positionend_pos (
int
) – genomic end positiongene (
Optional
[str
]) – HGNC gene symbolresidue_mode (
ResidueMode
) – Starting residue mode forstart_pos
andend_pos
. Will always return coordinates in inter-residue.
- Return type:
Union
[GenomicRepresentation
,CdnaRepresentation
,None
]- Returns:
MANE Transcripts with cDNA change on c. coordinate if gene is provided. Else, GRCh38 data
- async get_longest_compatible_transcript(start_pos, end_pos, start_annotation_layer, gene=None, ref=None, residue_mode=ResidueMode.RESIDUE, mane_transcripts=None, alt_ac=None, end_annotation_layer=None)[source]#
Get longest compatible transcript from a gene. See the documentation for the transcript compatibility policy for more information.
>>> import asyncio >>> from cool_seq_tool.app import CoolSeqTool >>> from cool_seq_tool.schemas import AnnotationLayer, ResidueMode >>> mane_mapper = CoolSeqTool().mane_transcript >>> mane_transcripts = { ... "ENST00000646891.2", ... "NM_001374258.1", ... "NM_004333.6", ... "ENST00000644969.2", ... } >>> result = asyncio.run(mane_mapper.get_longest_compatible_transcript( ... 599, ... 599, ... gene="BRAF", ... start_annotation_layer=AnnotationLayer.PROTEIN, ... residue_mode=ResidueMode.INTER_RESIDUE, ... mane_transcripts=mane_transcripts, ... )) >>> result.refseq 'NP_001365396.1'
If unable to find a match on GRCh38, this method will then attempt to drop down to GRCh37.
# TODO example for inputs that demonstrate this?
- Parameters:
start_pos (
int
) – Start position changeend_pos (
int
) – End position changestart_annotation_layer (
AnnotationLayer
) – Starting annotation layergene (
Optional
[str
]) – HGNC gene symbolref (
Optional
[str
]) – Reference at position given during inputresidue_mode (
ResidueMode
) – Residue mode forstart_pos
andend_pos
mane_transcripts (
Optional
[Set
]) – Attempted mane transcripts that were not compatiblealt_ac (
Optional
[str
]) – Genomic accessionend_annotation_layer (
Optional
[EndAnnotationLayer
]) – The end annotation layer. If not provided, will be set toEndAnnotationLayer.PROTEIN
ifstart_annotation_layer == AnnotationLayer.PROTEIN
,EndAnnotationLayer.CDNA
otherwise
- Return type:
Union
[DataRepresentation
,CdnaRepresentation
,ProteinAndCdnaRepresentation
,None
]- Returns:
Data for longest compatible transcript
- static get_mane_c_pos_change(mane_tx_genomic_data, coding_start_site)[source]#
Get mane c position change
- Parameters:
mane_tx_genomic_data (
Dict
) – MANE transcript and genomic datacoding_start_site (
int
) – Coding start site
- Return type:
Tuple
[int
,int
]- Returns:
cDNA pos start, cDNA pos end
- async get_mane_transcript(ac, start_pos, end_pos, start_annotation_layer, gene=None, ref=None, try_longest_compatible=False, residue_mode=ResidueMode.RESIDUE)[source]#
Return MANE transcript.
>>> from cool_seq_tool.app import CoolSeqTool >>> from cool_seq_tool.schemas import AnnotationLayer, ResidueMode >>> import asyncio >>> mane_mapper = CoolSeqTool().mane_transcript >>> result = asyncio.run(mane_mapper.get_mane_transcript( ... "NP_004324.2", ... 599, ... AnnotationLayer.PROTEIN, ... residue_mode=ResidueMode.INTER_RESIDUE, ... )) >>> result.gene, result.refseq, result.status ('BRAF', 'NP_004324.2', <TranscriptPriority.MANE_SELECT: 'mane_select'>)
- Parameters:
ac (
str
) – Accessionstart_pos (
int
) – Start position changeend_pos (
int
) – End position changestart_annotation_layer (
AnnotationLayer
) – Starting annotation layer.gene (
Optional
[str
]) – HGNC gene symbolref (
Optional
[str
]) – Reference at position given during inputtry_longest_compatible (
bool
) –True
if should try longest compatible remaining if mane transcript was not compatible.False
otherwise.residue_mode (ResidueMode) – Starting residue mode for
start_pos
andend_pos
. Will always return coordinates in inter-residue
- Return type:
Union
[DataRepresentation
,CdnaRepresentation
,None
]- Returns:
MANE data or longest transcript compatible data if validation checks are correct. Will return inter-residue coordinates. Else,
None
.
- async grch38_to_mane_c_p(alt_ac, start_pos, end_pos, gene=None, residue_mode=ResidueMode.RESIDUE, try_longest_compatible=False)[source]#
Given GRCh38 genomic representation, return protein representation.
Will try MANE Select and then MANE Plus Clinical. If neither is found and
try_longest_compatible
is set totrue
, will also try to find the longest compatible remaining representation.- Parameters:
alt_ac (
str
) – Genomic RefSeq accession on GRCh38start_pos (
int
) – Start positionend_pos (
int
) – End positiongene (
Optional
[str
]) – HGNC gene symbolresidue_mode (
ResidueMode
) – Starting residue mode forstart_pos
andend_pos
. Will always return coordinates as inter-residue.try_longest_compatible (
bool
) –True
if should try longest compatible remaining if mane transcript(s) not compatible.False
otherwise.
- Return type:
Optional
[Dict
]- Returns:
If successful, return MANE data or longest compatible remaining (if
try_longest_compatible
set toTrue
) cDNA and protein representation. Will return inter-residue coordinates.
- class cool_seq_tool.mappers.mane_transcript.ProteinAndCdnaRepresentation(**data)[source]#
Define object model for protein and cDNA representation
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}[source]#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
protein:
DataRepresentation
[source]#