cool_seq_tool.mappers.alignment#

Module containing alignment methods for translating to and from different reference sequences.

class cool_seq_tool.mappers.alignment.AlignmentMapper(seqrepo_access, transcript_mappings, uta_db)[source]#

Class for translating between p –> c –> g reference sequences.

__init__(seqrepo_access, transcript_mappings, uta_db)[source]#

Initialize the AlignmentMapper class.

Parameters:
  • seqrepo_access (SeqRepoAccess) – Access to seqrepo queries

  • transcript_mappings (TranscriptMappings) – Access to transcript accession mappings and conversions

  • uta_db (UtaDatabase) – UtaDatabase instance to give access to query UTA database

async c_to_g(c_ac, c_start_pos, c_end_pos, cds_start=None, residue_mode=ResidueMode.RESIDUE, target_genome_assembly=Assembly.GRCH38)[source]#

Translate cDNA representation to genomic representation

Parameters:
  • c_ac (str) – cDNA RefSeq accession

  • c_start_pos (int) – cDNA start position for codon

  • c_end_pos (int) – cDNA end position for codon

  • coding_start_site – Coding start site. If not provided, this will be computed.

  • target_genome_assembly (bool) – Genome assembly to get genomic data for

Return type:

Tuple[Optional[Dict], Optional[str]]

Returns:

Tuple containing:

  • Genomic representation (ac, positions) if able to translate. Will return positions as inter-residue coordinates. Else None.

  • Warning, if unable to translate to genomic representation. Else None

async p_to_c(p_ac, p_start_pos, p_end_pos, residue_mode=ResidueMode.RESIDUE)[source]#

Translate protein representation to cDNA representation.

Parameters:
  • p_ac (str) – Protein RefSeq accession

  • p_start_pos (int) – Protein start position

  • p_end_pos (int) – Protein end position

  • residue_mode (ResidueMode) – Residue mode for p_start_pos and p_end_pos

Return type:

Tuple[Optional[Dict], Optional[str]]

Returns:

Tuple containing:

  • cDNA representation (accession, codon range positions for corresponding change, cds start site) if able to translate. Will return positions as inter-residue coordinates. If unable to translate, returns None.

  • Warning, if unable to translate to cDNA representation. Else None

async p_to_g(p_ac, p_start_pos, p_end_pos, residue_mode=ResidueMode.INTER_RESIDUE, target_genome_assembly=Assembly.GRCH38)[source]#

Translate protein representation to genomic representation, by way of intermediary conversion into cDNA coordinates.

Parameters:
  • p_ac (str) – Protein RefSeq accession

  • p_start_pos (int) – Protein start position

  • p_end_pos (int) – Protein end position

  • residue_mode (ResidueMode) – Residue mode for p_start_pos and p_end_pos.

  • target_genome_assembly (Assembly) – Genome assembly to get genomic data for

Return type:

Tuple[Optional[Dict], Optional[str]]

Returns:

Tuple containing:

  • Genomic representation (ac, positions) if able to translate. Will return positions as inter-residue coordinates. Else None.

  • Warnings, if conversion to cDNA or genomic coordinates fails.