cool_seq_tool.mappers.exon_genomic_coords#

Provide mapping capabilities between transcript exon and genomic coordinates.

class cool_seq_tool.mappers.exon_genomic_coords.ExonGenomicCoordsMapper(seqrepo_access, uta_db, mane_transcript, mane_transcript_mappings)[source]#

Provide capabilities for mapping transcript exon representation to/from genomic coordinate representation.

__init__(seqrepo_access, uta_db, mane_transcript, mane_transcript_mappings)[source]#

Initialize ExonGenomicCoordsMapper class.

A lot of resources are required for initialization, so when defaults are enough, it’s easiest to let the core CoolSeqTool class handle it for you:

>>> from cool_seq_tool.app import CoolSeqTool
>>> egc = CoolSeqTool().ex_g_coords_mapper

Note that this class’s public methods are all defined as async, so they will need to be called with await when called from a function, or run from an event loop. See the Usage section for more information.

>>> import asyncio
>>> result = asyncio.run(egc.transcript_to_genomic_coordinates(
...     "NM_002529.3",
...     exon_start=2,
...     exon_end=17
... ))
>>> result.genomic_data.start, result.genomic_data.end
(156864428, 156881456)
Param:

seqrepo_access: SeqRepo instance to give access to query SeqRepo database

Parameters:
  • uta_db (UtaDatabase) – UtaDatabase instance to give access to query UTA database

  • mane_transcript (ManeTranscript) – Instance to align to MANE or compatible representation

  • mane_transcript_mappings (ManeTranscriptMappings) – Instance to provide access to ManeTranscriptMappings class

async genomic_to_transcript_exon_coordinates(chromosome=None, alt_ac=None, start=None, end=None, strand=None, transcript=None, get_nearest_transcript_junction=False, gene=None, residue_mode=ResidueMode.RESIDUE)[source]#

Get transcript data for genomic data, lifted over to GRCh38.

MANE Transcript data will be returned if and only if transcript is not supplied. gene must be given in order to retrieve MANE Transcript data.

>>> import asyncio
>>> from cool_seq_tool.app import CoolSeqTool
>>> from cool_seq_tool.schemas import Strand
>>> egc = CoolSeqTool().ex_g_coords_mapper
>>> result = asyncio.run(egc.genomic_to_transcript_exon_coordinates(
...     chromosome="NC_000001.11",
...     start=154192136,
...     end=154170400,
...     strand=Strand.NEGATIVE,
...     transcript="NM_152263.3"
... ))
>>> result.genomic_data.exon_start, result.genomic_data.exon_end
(1, 8)
Parameters:
  • chromosome (Optional[str]) – Chromosome. Must give chromosome without a prefix (i.e. 1 or X). If not provided, must provide alt_ac. If alt_ac is also provided, alt_ac will be used.

  • alt_ac (Optional[str]) – Genomic accession (i.e. NC_000001.11). If not provided, must provide chromosome. If ``chromosome is also provided, alt_ac will be used.

  • start (Optional[int]) – Start genomic position

  • end (Optional[int]) – End genomic position

  • strand (Optional[Strand]) – Strand

  • transcript (Optional[str]) – The transcript to use. If this is not given, we will try the following transcripts: MANE Select, MANE Clinical Plus, Longest Remaining Compatible Transcript. See the Transcript Selection policy page.

param get_nearest_transcript_junction: If True, this will return the

adjacent exon if the position specified by``start`` or end does not occur on an exon. For the positive strand, adjacent is defined as the exon preceding the breakpoint for the 5’ end and the exon following the breakpoint for the 3’ end. For the negative strand, adjacent is defined as the exon following the breakpoint for the 5’ end and the exon preceding the breakpoint for the 3’ end.

Parameters:

residue_mode (Union[inter-residue, residue]) – Residue mode for start and end

Return type:

GenomicDataResponse

Returns:

Genomic data (inter-residue coordinates)

get_tx_exon_coords(transcript, tx_exons, exon_start=None, exon_end=None)[source]#

Get exon coordinates for exon_start and exon_end

Parameters:
  • transcript (str) – Transcript accession

  • tx_exons (List[Tuple[int, int]]) – List of all transcript exons and coordinates

  • exon_start (Optional[int]) – Start exon number

  • exon_end (Optional[int]) – End exon number

Return type:

Tuple[Optional[Tuple[Optional[Tuple[int, int]], Optional[Tuple[int, int]]]], Optional[str]]

Returns:

[Transcript start exon coords, Transcript end exon coords], and warnings if found

async transcript_to_genomic_coordinates(transcript, gene=None, exon_start=None, exon_start_offset=0, exon_end=None, exon_end_offset=0)[source]#

Get genomic data given transcript data.

By default, transcript data is aligned to the GRCh38 assembly.

>>> import asyncio
>>> from cool_seq_tool.app import CoolSeqTool
>>> egc = CoolSeqTool().ex_g_coords_mapper
>>> tpm3 = asyncio.run(egc.transcript_to_genomic_coordinates(
...     "NM_152263.3"
...     gene="TPM3", chr="NC_000001.11",
...     exon_start=1, exon_end=8,
... ))
>>> tpm3.genomic_data.chr, tpm3.genomic_data.start, tpm3.genomic_data.end
('NC_000001.11', 154192135, 154170399)
Parameters:
  • transcript (str) – Transcript accession

  • gene (Optional[str]) – HGNC gene symbol

  • exon_start (Optional[int]) – Starting transcript exon number (1-based). If not provided, must provide exon_end

  • exon_start_offset (int) – Starting exon offset

  • exon_end (Optional[int]) – Ending transcript exon number (1-based). If not provided, must provide exon_start

  • exon_end_offset (int) – Ending exon offset

Return type:

GenomicDataResponse

Returns:

GRCh38 genomic data (inter-residue coordinates)