cool_seq_tool.mappers.exon_genomic_coords#

Provide mapping capabilities between transcript exon and genomic coordinates.

class cool_seq_tool.mappers.exon_genomic_coords.ExonGenomicCoordsMapper(seqrepo_access, uta_db, mane_transcript_mappings, liftover)[source]#

Provide capabilities for mapping transcript exon representation to/from genomic coordinate representation.

__init__(seqrepo_access, uta_db, mane_transcript_mappings, liftover)[source]#

Initialize ExonGenomicCoordsMapper class.

A lot of resources are required for initialization, so when defaults are enough, it’s easiest to let the core CoolSeqTool class handle it for you:

>>> from cool_seq_tool import CoolSeqTool
>>> egc = CoolSeqTool().ex_g_coords_mapper

Note that this class’s public methods are all defined as async, so they will need to be called with await when called from a function, or run from an event loop. See the Usage section for more information.

>>> import asyncio
>>> result = asyncio.run(
...     egc.tx_segment_to_genomic("NM_002529.3", exon_start=2, exon_end=17)
... )
>>> result.genomic_data.start, result.genomic_data.end
(156864428, 156881456)
Parameters:
  • seqrepo_access (SeqRepoAccess) – SeqRepo instance to give access to query SeqRepo database

  • uta_db (UtaDatabase) – UtaDatabase instance to give access to query UTA database

  • mane_transcript_mappings (ManeTranscriptMappings) – Instance to provide access to ManeTranscriptMappings class

  • liftover (LiftOver) – Instance to provide mapping between human genome assemblies

async genomic_to_tx_segment(chromosome=None, genomic_ac=None, seg_start_genomic=None, seg_end_genomic=None, transcript=None, get_nearest_transcript_junction=False, gene=None)[source]#

Get transcript segment data for genomic data, lifted over to GRCh38.

If liftover to GRCh38 is unsuccessful, will return errors.

Must provide inter-residue coordinates.

MANE Transcript data will be returned if and only if transcript is not supplied. gene must be given in order to retrieve MANE Transcript data.

>>> import asyncio
>>> from cool_seq_tool import CoolSeqTool
>>> from cool_seq_tool.schemas import Strand
>>> egc = CoolSeqTool().ex_g_coords_mapper
>>> result = asyncio.run(
...     egc.genomic_to_tx_segment(
...         genomic_ac="NC_000001.11",
...         seg_start_genomic=154192135,
...         seg_end_genomic=154170399,
...         transcript="NM_152263.3",
...     )
... )
>>> result.seg_start.exon_ord, result.seg_end.exon_ord
(0, 7)
Parameters:
  • chromosome (Optional[str]) – e.g. "1" or "chr1". If not provided, must provide genomic_ac. If genomic_ac is also provided, genomic_ac will be used.

  • genomic_ac (Optional[str]) – Genomic accession (i.e. NC_000001.11). If not provided, must provide chromosome. If ``chromosome is also provided, genomic_ac will be used.

  • seg_start_genomic (Optional[int]) – Genomic position where the transcript segment starts

  • seg_end_genomic (Optional[int]) – Genomic position where the transcript segment ends

  • transcript (Optional[str]) – The transcript to use. If this is not given, we will try the following transcripts: MANE Select, MANE Clinical Plus, Longest Remaining Compatible Transcript. See the Transcript Selection policy page.

  • get_nearest_transcript_junction (bool) – If True, this will return the adjacent exon if the position specified by``seg_start_genomic`` or seg_end_genomic does not occur on an exon. For the positive strand, adjacent is defined as the exon preceding the breakpoint for the 5’ end and the exon following the breakpoint for the 3’ end. For the negative strand, adjacent is defined as the exon following the breakpoint for the 5’ end and the exon preceding the breakpoint for the 3’ end.

  • gene (Optional[str]) – gene name. Ideally, HGNC symbol. Must be given if no transcript value is provided.

  • coordinate_type – Coordinate type for seg_start_genomic and seg_end_genomic

Return type:

GenomicTxSegService

Returns:

Genomic data (inter-residue coordinates)

async tx_segment_to_genomic(transcript, gene=None, exon_start=None, exon_start_offset=0, exon_end=None, exon_end_offset=0)[source]#

Get aligned genomic data given transcript segment data.

By default, transcript data is aligned to the GRCh38 assembly.

>>> import asyncio
>>> from cool_seq_tool import CoolSeqTool
>>> egc = CoolSeqTool().ex_g_coords_mapper
>>> tpm3 = asyncio.run(
...     egc.tx_segment_to_genomic(
...         "NM_152263.3",
...         gene="TPM3",
...         exon_start=1,
...         exon_end=8,
...     )
... )
>>> (
...     tpm3.genomic_ac,
...     tpm3.seg_start.genomic_location.end,
...     tpm3.seg_end.genomic_location.start,
... )
('NC_000001.11', 154192135, 154170399)
Parameters:
  • transcript (str) – RefSeq transcript accession

  • gene (Optional[str]) – HGNC gene symbol

  • exon_start (Optional[int]) – Starting transcript exon number (1-based). If not provided, must provide exon_end

  • exon_start_offset (int) – Starting exon offset

  • exon_end (Optional[int]) – Ending transcript exon number (1-based). If not provided, must provide exon_start

  • exon_end_offset (int) – Ending exon offset

Return type:

GenomicTxSegService

Returns:

GRCh38 genomic data (inter-residue coordinates)

class cool_seq_tool.mappers.exon_genomic_coords.GenomicTxSeg(**data)[source]#

Model for representing a boundary for a transcript segment.

classmethod check_errors(values)[source]#

Ensure that fields are (un)set depending on errors

Parameters:

values (dict) – Values in model

Raises:

ValueError – If seg, genomic_ac and tx_ac are not

provided when there are no errors :rtype: dict :return: Values in model

errors: list[Annotated[str]][source]#
gene: Optional[Annotated[str]][source]#
genomic_ac: Optional[Annotated[str]][source]#
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}[source]#

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

seg: Optional[TxSegment][source]#
tx_ac: Optional[Annotated[str]][source]#
class cool_seq_tool.mappers.exon_genomic_coords.GenomicTxSegService(**data)[source]#

Service model for genomic and transcript data.

classmethod add_meta_check_errors(values)[source]#

Add service metadata to model and ensure that fields are (un)set depending on errors

Parameters:

values (dict) – Values in model

Raises:

ValueError – If genomic_ac, tx_ac and seg_start or seg_end not provided when there are no errors

Return type:

dict

Returns:

Values in model, including service metadata

errors: list[Annotated[str]][source]#
gene: Optional[Annotated[str]][source]#
genomic_ac: Optional[Annotated[str]][source]#
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}[source]#

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

seg_end: Optional[TxSegment][source]#
seg_start: Optional[TxSegment][source]#
service_meta: ServiceMeta[source]#
tx_ac: Optional[Annotated[str]][source]#
class cool_seq_tool.mappers.exon_genomic_coords.TxSegment(**data)[source]#

Model for representing transcript segment data.

exon_ord: Annotated[int][source]#
genomic_location: SequenceLocation[source]#
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}[source]#

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

offset: Annotated[int][source]#