cool_seq_tool.mappers.exon_genomic_coords#
Provide mapping capabilities between transcript exon and genomic coordinates.
- class cool_seq_tool.mappers.exon_genomic_coords.ExonGenomicCoordsMapper(seqrepo_access, uta_db, mane_transcript, mane_transcript_mappings)[source]#
Provide capabilities for mapping transcript exon representation to/from genomic coordinate representation.
- __init__(seqrepo_access, uta_db, mane_transcript, mane_transcript_mappings)[source]#
Initialize ExonGenomicCoordsMapper class.
A lot of resources are required for initialization, so when defaults are enough, it’s easiest to let the core CoolSeqTool class handle it for you:
>>> from cool_seq_tool.app import CoolSeqTool >>> egc = CoolSeqTool().ex_g_coords_mapper
Note that this class’s public methods are all defined as
async
, so they will need to be called withawait
when called from a function, or run from an event loop. See the Usage section for more information.>>> import asyncio >>> result = asyncio.run(egc.transcript_to_genomic_coordinates( ... "NM_002529.3", ... exon_start=2, ... exon_end=17 ... )) >>> result.genomic_data.start, result.genomic_data.end (156864428, 156881456)
- Param:
seqrepo_access: SeqRepo instance to give access to query SeqRepo database
- Parameters:
uta_db (
UtaDatabase
) – UtaDatabase instance to give access to query UTA databasemane_transcript (
ManeTranscript
) – Instance to align to MANE or compatible representationmane_transcript_mappings (
ManeTranscriptMappings
) – Instance to provide access to ManeTranscriptMappings class
- async genomic_to_transcript_exon_coordinates(chromosome=None, alt_ac=None, start=None, end=None, strand=None, transcript=None, get_nearest_transcript_junction=False, gene=None, residue_mode=ResidueMode.RESIDUE)[source]#
Get transcript data for genomic data, lifted over to GRCh38.
MANE Transcript data will be returned if and only if
transcript
is not supplied.gene
must be given in order to retrieve MANE Transcript data.>>> import asyncio >>> from cool_seq_tool.app import CoolSeqTool >>> from cool_seq_tool.schemas import Strand >>> egc = CoolSeqTool().ex_g_coords_mapper >>> result = asyncio.run(egc.genomic_to_transcript_exon_coordinates( ... chromosome="NC_000001.11", ... start=154192136, ... end=154170400, ... strand=Strand.NEGATIVE, ... transcript="NM_152263.3" ... )) >>> result.genomic_data.exon_start, result.genomic_data.exon_end (1, 8)
- Parameters:
chromosome (
Optional
[str
]) – Chromosome. Must give chromosome without a prefix (i.e.1
orX
). If not provided, must providealt_ac
. Ifalt_ac
is also provided,alt_ac
will be used.alt_ac (
Optional
[str
]) – Genomic accession (i.e.NC_000001.11
). If not provided, must providechromosome. If ``chromosome
is also provided,alt_ac
will be used.start (
Optional
[int
]) – Start genomic positionend (
Optional
[int
]) – End genomic positionstrand (
Optional
[Strand
]) – Strandtranscript (
Optional
[str
]) – The transcript to use. If this is not given, we will try the following transcripts: MANE Select, MANE Clinical Plus, Longest Remaining Compatible Transcript. See the Transcript Selection policy page.
- param get_nearest_transcript_junction: If
True
, this will return the adjacent exon if the position specified by``start`` or
end
does not occur on an exon. For the positive strand, adjacent is defined as the exon preceding the breakpoint for the 5’ end and the exon following the breakpoint for the 3’ end. For the negative strand, adjacent is defined as the exon following the breakpoint for the 5’ end and the exon preceding the breakpoint for the 3’ end.
- Parameters:
residue_mode (
Union
[inter-residue, residue]) – Residue mode forstart
andend
- Return type:
- Returns:
Genomic data (inter-residue coordinates)
- get_tx_exon_coords(transcript, tx_exons, exon_start=None, exon_end=None)[source]#
Get exon coordinates for
exon_start
andexon_end
- Parameters:
transcript (
str
) – Transcript accessiontx_exons (
List
[Tuple
[int
,int
]]) – List of all transcript exons and coordinatesexon_start (
Optional
[int
]) – Start exon numberexon_end (
Optional
[int
]) – End exon number
- Return type:
Tuple
[Optional
[Tuple
[Optional
[Tuple
[int
,int
]],Optional
[Tuple
[int
,int
]]]],Optional
[str
]]- Returns:
[Transcript start exon coords, Transcript end exon coords], and warnings if found
- async transcript_to_genomic_coordinates(transcript, gene=None, exon_start=None, exon_start_offset=0, exon_end=None, exon_end_offset=0)[source]#
Get genomic data given transcript data.
By default, transcript data is aligned to the GRCh38 assembly.
>>> import asyncio >>> from cool_seq_tool.app import CoolSeqTool >>> egc = CoolSeqTool().ex_g_coords_mapper >>> tpm3 = asyncio.run(egc.transcript_to_genomic_coordinates( ... "NM_152263.3" ... gene="TPM3", chr="NC_000001.11", ... exon_start=1, exon_end=8, ... )) >>> tpm3.genomic_data.chr, tpm3.genomic_data.start, tpm3.genomic_data.end ('NC_000001.11', 154192135, 154170399)
- Parameters:
transcript (
str
) – Transcript accessiongene (
Optional
[str
]) – HGNC gene symbolexon_start (
Optional
[int
]) – Starting transcript exon number (1-based). If not provided, must provideexon_end
exon_start_offset (
int
) – Starting exon offsetexon_end (
Optional
[int
]) – Ending transcript exon number (1-based). If not provided, must provideexon_start
exon_end_offset (
int
) – Ending exon offset
- Return type:
- Returns:
GRCh38 genomic data (inter-residue coordinates)