cool_seq_tool.mappers.exon_genomic_coords#
Provide mapping capabilities between transcript exon and genomic coordinates.
- class cool_seq_tool.mappers.exon_genomic_coords.ExonGenomicCoordsMapper(seqrepo_access, uta_db, mane_transcript_mappings, liftover)[source]#
Provide capabilities for mapping transcript exon representation to/from genomic coordinate representation.
- __init__(seqrepo_access, uta_db, mane_transcript_mappings, liftover)[source]#
Initialize ExonGenomicCoordsMapper class.
A lot of resources are required for initialization, so when defaults are enough, it’s easiest to let the core CoolSeqTool class handle it for you:
>>> from cool_seq_tool import CoolSeqTool >>> egc = CoolSeqTool().ex_g_coords_mapper
Note that this class’s public methods are all defined as
async, so they will need to be called withawaitwhen called from a function, or run from an event loop. See the Usage section for more information.>>> import asyncio >>> result = asyncio.run( ... egc.tx_segment_to_genomic("NM_002529.3", exon_start=2, exon_end=17) ... ) >>> result.genomic_data.start, result.genomic_data.end (156864428, 156881456)
- Parameters:
seqrepo_access (
SeqRepoAccess) – SeqRepo instance to give access to query SeqRepo databaseuta_db (
UtaDatabase) – UtaDatabase instance to give access to query UTA databasemane_transcript_mappings (
ManeTranscriptMappings) – Instance to provide access to ManeTranscriptMappings classliftover (
LiftOver) – Instance to provide mapping between human genome assemblies
- async genomic_to_tx_segment(chromosome=None, genomic_ac=None, seg_start_genomic=None, seg_end_genomic=None, transcript=None, get_nearest_transcript_junction=False, gene=None)[source]#
Get transcript segment data for genomic data, lifted over to GRCh38.
If liftover to GRCh38 is unsuccessful, will return errors.
Must provide inter-residue coordinates.
MANE Transcript data will be returned if and only if
transcriptis not supplied.genemust be given in order to retrieve MANE Transcript data.>>> import asyncio >>> from cool_seq_tool import CoolSeqTool >>> from cool_seq_tool.schemas import Strand >>> egc = CoolSeqTool().ex_g_coords_mapper >>> result = asyncio.run( ... egc.genomic_to_tx_segment( ... genomic_ac="NC_000001.11", ... seg_start_genomic=154192135, ... seg_end_genomic=154170399, ... transcript="NM_152263.3", ... ) ... ) >>> result.seg_start.exon_ord, result.seg_end.exon_ord (0, 7)
- Parameters:
chromosome (
Optional[str]) – e.g."1"or"chr1". If not provided, must providegenomic_ac. Ifgenomic_acis also provided,genomic_acwill be used.genomic_ac (
Optional[str]) – Genomic accession (i.e.NC_000001.11). If not provided, must providechromosome. If ``chromosomeis also provided,genomic_acwill be used.seg_start_genomic (
Optional[int]) – Genomic position where the transcript segment startsseg_end_genomic (
Optional[int]) – Genomic position where the transcript segment endstranscript (
Optional[str]) – The transcript to use. If this is not given, we will try the following transcripts: MANE Select, MANE Clinical Plus, Longest Remaining Compatible Transcript. See the Transcript Selection policy page.get_nearest_transcript_junction (
bool) – IfTrue, this will return the adjacent exon if the position specified by``seg_start_genomic`` orseg_end_genomicdoes not occur on an exon. For the positive strand, adjacent is defined as the exon preceding the breakpoint for the 5’ end and the exon following the breakpoint for the 3’ end. For the negative strand, adjacent is defined as the exon following the breakpoint for the 5’ end and the exon preceding the breakpoint for the 3’ end.gene (
Optional[str]) – gene name. Ideally, HGNC symbol. Must be given if notranscriptvalue is provided.coordinate_type – Coordinate type for
seg_start_genomicandseg_end_genomic
- Return type:
- Returns:
Genomic data (inter-residue coordinates)
- async tx_segment_to_genomic(transcript, gene=None, exon_start=None, exon_start_offset=0, exon_end=None, exon_end_offset=0)[source]#
Get aligned genomic data given transcript segment data.
By default, transcript data is aligned to the GRCh38 assembly.
>>> import asyncio >>> from cool_seq_tool import CoolSeqTool >>> egc = CoolSeqTool().ex_g_coords_mapper >>> tpm3 = asyncio.run( ... egc.tx_segment_to_genomic( ... "NM_152263.3", ... gene="TPM3", ... exon_start=1, ... exon_end=8, ... ) ... ) >>> ( ... tpm3.genomic_ac, ... tpm3.seg_start.genomic_location.end, ... tpm3.seg_end.genomic_location.start, ... ) ('NC_000001.11', 154192135, 154170399)
- Parameters:
transcript (
str) – RefSeq transcript accessiongene (
Optional[str]) – HGNC gene symbolexon_start (
Optional[int]) – Starting transcript exon number (1-based). If not provided, must provideexon_endexon_start_offset (
int) – Starting exon offsetexon_end (
Optional[int]) – Ending transcript exon number (1-based). If not provided, must provideexon_startexon_end_offset (
int) – Ending exon offset
- Return type:
- Returns:
GRCh38 genomic data (inter-residue coordinates)
- class cool_seq_tool.mappers.exon_genomic_coords.GenomicTxSeg(**data)[source]#
Model for representing a boundary for a transcript segment.
- classmethod check_errors(values)[source]#
Ensure that fields are (un)set depending on errors
- Parameters:
values (
dict) – Values in model- Raises:
ValueError – If seg, genomic_ac and tx_ac are not
provided when there are no errors :rtype:
dict:return: Values in model
- class cool_seq_tool.mappers.exon_genomic_coords.GenomicTxSegService(**data)[source]#
Service model for genomic and transcript data.
- classmethod add_meta_check_errors(values)[source]#
Add service metadata to model and ensure that fields are (un)set depending on errors
- Parameters:
values (
dict) – Values in model- Raises:
ValueError – If genomic_ac, tx_ac and seg_start or seg_end not provided when there are no errors
- Return type:
dict- Returns:
Values in model, including service metadata
- model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}[source]#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
service_meta:
ServiceMeta[source]#