cool_seq_tool.mappers.mane_transcript#

Retrieve MANE transcript from a location on p./c./g. coordinates.

Steps:

  1. Map annotation layer to genome

  2. Liftover to preferred genome (GRCh38). GRCh36 and earlier assemblies are not supported for fetching MANE transcripts.

  3. Select preferred compatible annotation (see transcript compatibility)

  4. Map back to correct annotation layer

In addition to a mapper utility class, this module also defines several vocabulary constraints and data models for coordinate representation.

class cool_seq_tool.mappers.mane_transcript.CdnaRepresentation(**data)[source]#

Define object model for coding DNA representation

alt_ac: Optional[str][source]#
coding_end_site: int[source]#
coding_start_site: int[source]#
class cool_seq_tool.mappers.mane_transcript.DataRepresentation(**data)[source]#

Define object model for final output representation

ensembl: Optional[str][source]#
gene: Optional[str][source]#
pos: tuple[int, int][source]#
refseq: Optional[str][source]#
status: TranscriptPriority[source]#
strand: Strand[source]#
class cool_seq_tool.mappers.mane_transcript.EndAnnotationLayer(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Define constraints for end annotation layer. This is used for determining the end annotation layer when getting the longest compatible remaining representation

CDNA = 'c'[source]#
PROTEIN = 'p'[source]#
PROTEIN_AND_CDNA = 'p_and_c'[source]#
class cool_seq_tool.mappers.mane_transcript.GenomicRepresentation(**data)[source]#

Define object model for genomic representation

ac: str[source]#
mane_genes: list[ManeGeneData][source]#
pos: tuple[int, int][source]#
status: Literal['grch38'][source]#
class cool_seq_tool.mappers.mane_transcript.ManeTranscript(seqrepo_access, transcript_mappings, mane_transcript_mappings, uta_db, liftover)[source]#

Class for retrieving MANE transcripts.

__init__(seqrepo_access, transcript_mappings, mane_transcript_mappings, uta_db, liftover)[source]#

Initialize the ManeTranscript class.

A handful of resources are required for initialization, so when defaults are enough, it’s easiest to let the core CoolSeqTool class handle it for you:

>>> from cool_seq_tool import CoolSeqTool
>>> mane_mapper = CoolSeqTool().mane_transcript

Note that most methods are defined as Python coroutines, so they must be called with await or run from an async event loop:

>>> import asyncio
>>> result = asyncio.run(mane_mapper.g_to_grch38("NC_000001.11", 100, 200))
>>> result.ac
'NC_000001.11'

See the Usage section for more information.

Parameters:
  • seqrepo_access (SeqRepoAccess) – Access to seqrepo queries

  • transcript_mappings (TranscriptMappings) – Access to transcript accession mappings and conversions

  • mane_transcript_mappings (ManeTranscriptMappings) – Access to MANE Transcript accession mapping data

  • uta_db (UtaDatabase) – UtaDatabase instance to give access to query UTA database

  • liftover (LiftOver) – Instance to provide mapping between human genome assemblies

async g_to_grch38(ac, start_pos, end_pos, get_mane_genes=False, coordinate_type=CoordinateType.RESIDUE)[source]#

Return genomic coordinate on GRCh38 when not given gene context.

Parameters:
  • ac (str) – Genomic accession

  • start_pos (int) – Genomic start position

  • end_pos (int) – Genomic end position

  • get_mane_genes (bool) – True if mane genes for genomic position should be included in response. False, otherwise.

  • coordinate_type (CoordinateType) – Coordinate type for start_pos and end_pos

Return type:

Optional[GenomicRepresentation]

Returns:

GRCh38 genomic representation (accession and start/end inter-residue position)

async g_to_mane_c(ac, start_pos, end_pos, gene, coordinate_type=CoordinateType.RESIDUE)[source]#

Return MANE Transcript on the c. coordinate.

>>> import asyncio
>>> from cool_seq_tool import CoolSeqTool
>>> cst = CoolSeqTool()
>>> result = asyncio.run(
...     cst.mane_transcript.g_to_mane_c(
...         "NC_000007.13", 55259515, None, gene="EGFR"
...     )
... )
>>> type(result)
<class 'cool_seq_tool.mappers.mane_transcript.CdnaRepresentation'>
>>> result.status
<TranscriptPriority.MANE_SELECT: 'mane_select'>
>>> del cst
Parameters:
  • ac (str) – Transcript accession on g. coordinate

  • start_pos (int) – genomic start position

  • end_pos (int) – genomic end position

  • gene (str) – HGNC gene symbol

  • coordinate_type (CoordinateType) – Starting Coordinate type for start_pos and end_pos. Will always return inter-residue coordinates.

Return type:

Optional[CdnaRepresentation]

Returns:

MANE Transcripts with cDNA change on c. coordinate

async get_longest_compatible_transcript(start_pos, end_pos, start_annotation_layer, gene=None, ref=None, coordinate_type=CoordinateType.RESIDUE, mane_transcripts=None, alt_ac=None, end_annotation_layer=None)[source]#

Get longest compatible transcript from a gene. See the documentation for the transcript compatibility policy for more information.

>>> import asyncio
>>> from cool_seq_tool import CoolSeqTool
>>> from cool_seq_tool.schemas import AnnotationLayer, CoordinateType
>>> mane_mapper = CoolSeqTool().mane_transcript
>>> mane_transcripts = {
...     "ENST00000646891.2",
...     "NM_001374258.1",
...     "NM_004333.6",
...     "ENST00000644969.2",
... }
>>> result = asyncio.run(
...     mane_mapper.get_longest_compatible_transcript(
...         599,
...         599,
...         gene="BRAF",
...         start_annotation_layer=AnnotationLayer.PROTEIN,
...         coordinate_type=CoordinateType.INTER_RESIDUE,
...         mane_transcripts=mane_transcripts,
...     )
... )
>>> result.refseq
'NP_001365396.1'

If unable to find a match on GRCh38, this method will then attempt to drop down to GRCh37.

# TODO example for inputs that demonstrate this?

Parameters:
  • start_pos (int) – Start position change

  • end_pos (int) – End position change

  • start_annotation_layer (AnnotationLayer) – Starting annotation layer

  • gene (Optional[str]) – HGNC gene symbol

  • ref (Optional[str]) – Reference at position given during input

  • coordinate_type (CoordinateType) – Coordinate type for start_pos and end_pos

  • mane_transcripts (Optional[set]) – Attempted mane transcripts that were not compatible

  • alt_ac (Optional[str]) – Genomic accession

  • end_annotation_layer (Optional[EndAnnotationLayer]) – The end annotation layer. If not provided, will be set to EndAnnotationLayer.PROTEIN if start_annotation_layer == AnnotationLayer.PROTEIN, EndAnnotationLayer.CDNA otherwise

Return type:

UnionType[DataRepresentation, CdnaRepresentation, ProteinAndCdnaRepresentation, None]

Returns:

Data for longest compatible transcript

static get_mane_c_pos_change(mane_tx_genomic_data, coding_start_site)[source]#

Get mane c position change

Parameters:
  • mane_tx_genomic_data (GenomicTxMetadata) – MANE transcript and genomic data

  • coding_start_site (int) – Coding start site

Return type:

tuple[int, int]

Returns:

cDNA pos start, cDNA pos end

async get_mane_transcript(ac, start_pos, end_pos, start_annotation_layer, gene=None, ref=None, try_longest_compatible=False, coordinate_type=CoordinateType.RESIDUE)[source]#

Return MANE representation

If start_annotation_layer is AnnotationLayer.PROTEIN, will return

AnnotationLayer.PROTEIN representation.

If start_annotation_layer is AnnotationLayer.CDNA, will return

AnnotationLayer.CDNA representation.

If start_annotation_layer is AnnotationLayer.GENOMIC will return

AnnotationLayer.CDNA representation if gene is provided and AnnotationLayer.GENOMIC GRCh38 representation if gene is NOT provided.

>>> from cool_seq_tool import CoolSeqTool
>>> from cool_seq_tool.schemas import AnnotationLayer, CoordinateType
>>> import asyncio
>>> mane_mapper = CoolSeqTool().mane_transcript
>>> result = asyncio.run(
...     mane_mapper.get_mane_transcript(
...         "NP_004324.2",
...         599,
...         AnnotationLayer.PROTEIN,
...         coordinate_type=CoordinateType.INTER_RESIDUE,
...     )
... )
>>> result.gene, result.refseq, result.status
('BRAF', 'NP_004324.2', <TranscriptPriority.MANE_SELECT: 'mane_select'>)
Parameters:
  • ac (str) – Accession

  • start_pos (int) – Start position change

  • end_pos (int) – End position change

  • start_annotation_layer (AnnotationLayer) – Starting annotation layer.

  • gene (Optional[str]) – HGNC gene symbol. If gene is not provided and start_annotation_layer is AnnotationLayer.GENOMIC, will return GRCh38 representation. If gene is provided and start_annotation_layer is AnnotationLayer.GENOMIC, will return cDNA representation.

  • ref (Optional[str]) – Reference at position given during input

  • try_longest_compatible (bool) – True if should try longest compatible remaining if mane transcript was not compatible. False otherwise.

  • coordinate_type (CoordinateType) – Starting Coordinate type for start_pos and end_pos. Will always return inter-residue coordinates

Return type:

UnionType[DataRepresentation, CdnaRepresentation, None]

Returns:

MANE data or longest transcript compatible data if validation checks are correct. Will return inter-residue coordinates. Else, None.

static get_reading_frame(pos)[source]#

Return reading frame number. Only used on c. coordinate.

Parameters:

pos (int) – cDNA position

Return type:

int

Returns:

Reading frame

async grch38_to_mane_c_p(alt_ac, start_pos, end_pos, gene=None, coordinate_type=CoordinateType.RESIDUE, try_longest_compatible=False)[source]#

Given GRCh38 genomic representation, return protein representation.

Will try MANE Select and then MANE Plus Clinical. If neither is found and try_longest_compatible is set to true, will also try to find the longest compatible remaining representation.

Parameters:
  • alt_ac (str) – Genomic RefSeq accession on GRCh38

  • start_pos (int) – Start position

  • end_pos (int) – End position

  • gene (Optional[str]) – HGNC gene symbol

  • coordinate_type (CoordinateType) – Starting Coordinate type for start_pos and end_pos. Will always return inter-residue coordinates.

  • try_longest_compatible (bool) – True if should try longest compatible remaining if mane transcript(s) not compatible. False otherwise.

Return type:

Optional[ProteinAndCdnaRepresentation]

Returns:

If successful, return MANE data or longest compatible remaining (if try_longest_compatible set to True) cDNA and protein representation. Will return inter-residue coordinates.

validate_index(ac, pos, coding_start_site)[source]#

Validate that positions actually exist on accession

Parameters:
  • ac (str) – Accession

  • pos (tuple[int, int]) – Start position change, End position change

  • coding_start_site (int) – coding start site for accession

Return type:

bool

Returns:

True if positions exist on accession. False otherwise

class cool_seq_tool.mappers.mane_transcript.ProteinAndCdnaRepresentation(**data)[source]#

Define object model for protein and cDNA representation

cdna: CdnaRepresentation[source]#
protein: DataRepresentation[source]#