cool_seq_tool.mappers.exon_genomic_coords#
Provide mapping capabilities between transcript exon and genomic coordinates.
- class cool_seq_tool.mappers.exon_genomic_coords.ExonGenomicCoordsMapper(seqrepo_access, uta_db, mane_transcript, mane_transcript_mappings)[source]#
Provide capabilities for mapping transcript exon representation to/from genomic coordinate representation.
- __init__(seqrepo_access, uta_db, mane_transcript, mane_transcript_mappings)[source]#
Initialize ExonGenomicCoordsMapper class.
A lot of resources are required for initialization, so when defaults are enough, it’s easiest to let the core CoolSeqTool class handle it for you:
>>> from cool_seq_tool.app import CoolSeqTool >>> egc = CoolSeqTool().ex_g_coords_mapper
Note that this class’s public methods are all defined as
async, so they will need to be called withawaitwhen called from a function, or run from an event loop. See the Usage section for more information.>>> import asyncio >>> result = asyncio.run( ... egc.transcript_to_genomic_coordinates( ... "NM_002529.3", exon_start=2, exon_end=17 ... ) ... ) >>> result.genomic_data.start, result.genomic_data.end (156864428, 156881456)
- Parameters:
seqrepo_access (
SeqRepoAccess) – SeqRepo instance to give access to query SeqRepo databaseuta_db (
UtaDatabase) – UtaDatabase instance to give access to query UTA databasemane_transcript (
ManeTranscript) – Instance to align to MANE or compatible representationmane_transcript_mappings (
ManeTranscriptMappings) – Instance to provide access to ManeTranscriptMappings class
- async genomic_to_transcript_exon_coordinates(chromosome=None, alt_ac=None, start=None, end=None, strand=None, transcript=None, get_nearest_transcript_junction=False, gene=None, residue_mode=ResidueMode.RESIDUE)[source]#
Get transcript data for genomic data, lifted over to GRCh38.
MANE Transcript data will be returned if and only if
transcriptis not supplied.genemust be given in order to retrieve MANE Transcript data.>>> import asyncio >>> from cool_seq_tool.app import CoolSeqTool >>> from cool_seq_tool.schemas import Strand >>> egc = CoolSeqTool().ex_g_coords_mapper >>> result = asyncio.run( ... egc.genomic_to_transcript_exon_coordinates( ... alt_ac="NC_000001.11", ... start=154192136, ... end=154170400, ... strand=Strand.NEGATIVE, ... transcript="NM_152263.3", ... ) ... ) >>> result.genomic_data.exon_start, result.genomic_data.exon_end (1, 8)
- Parameters:
chromosome (
Optional[str]) – Chromosome. Must give chromosome without a prefix (i.e.1orX). If not provided, must providealt_ac. Ifalt_acis also provided,alt_acwill be used.alt_ac (
Optional[str]) – Genomic accession (i.e.NC_000001.11). If not provided, must providechromosome. If ``chromosomeis also provided,alt_acwill be used.start (
Optional[int]) – Start genomic positionend (
Optional[int]) – End genomic positionstrand (
Optional[Strand]) – Strandtranscript (
Optional[str]) – The transcript to use. If this is not given, we will try the following transcripts: MANE Select, MANE Clinical Plus, Longest Remaining Compatible Transcript. See the Transcript Selection policy page.get_nearest_transcript_junction (
bool) – IfTrue, this will return the adjacent exon if the position specified by``start`` orenddoes not occur on an exon. For the positive strand, adjacent is defined as the exon preceding the breakpoint for the 5’ end and the exon following the breakpoint for the 3’ end. For the negative strand, adjacent is defined as the exon following the breakpoint for the 5’ end and the exon preceding the breakpoint for the 3’ end.residue_mode (
Union[Literal[<ResidueMode.INTER_RESIDUE: 'inter-residue'>],Literal[<ResidueMode.RESIDUE: 'residue'>]]) – Residue mode forstartandend
- Return type:
- Returns:
Genomic data (inter-residue coordinates)
- get_tx_exon_coords(transcript, tx_exons, exon_start=None, exon_end=None)[source]#
Get exon coordinates for
exon_startandexon_end- Parameters:
transcript (
str) – Transcript accessiontx_exons (
list[tuple[int,int]]) – List of all transcript exons and coordinatesexon_start (
Optional[int]) – Start exon numberexon_end (
Optional[int]) – End exon number
- Return type:
tuple[Optional[tuple[Optional[tuple[int,int]],Optional[tuple[int,int]]]],Optional[str]]- Returns:
[Transcript start exon coords, Transcript end exon coords], and warnings if found
- async transcript_to_genomic_coordinates(transcript, gene=None, exon_start=None, exon_start_offset=0, exon_end=None, exon_end_offset=0)[source]#
Get genomic data given transcript data.
By default, transcript data is aligned to the GRCh38 assembly.
>>> import asyncio >>> from cool_seq_tool.app import CoolSeqTool >>> egc = CoolSeqTool().ex_g_coords_mapper >>> tpm3 = asyncio.run( ... egc.transcript_to_genomic_coordinates( ... "NM_152263.3", ... gene="TPM3", ... exon_start=1, ... exon_end=8, ... ) ... ) >>> tpm3.genomic_data.chr, tpm3.genomic_data.start, tpm3.genomic_data.end ('NC_000001.11', 154192135, 154170399)
- Parameters:
transcript (
str) – Transcript accessiongene (
Optional[str]) – HGNC gene symbolexon_start (
Optional[int]) – Starting transcript exon number (1-based). If not provided, must provideexon_endexon_start_offset (
int) – Starting exon offsetexon_end (
Optional[int]) – Ending transcript exon number (1-based). If not provided, must provideexon_startexon_end_offset (
int) – Ending exon offset
- Return type:
- Returns:
GRCh38 genomic data (inter-residue coordinates)