cool_seq_tool.sources.transcript_mappings#

Provide mappings between gene symbols and RefSeq + Ensembl transcript accessions.

class cool_seq_tool.sources.transcript_mappings.TranscriptMappings(transcript_file_path=TRANSCRIPT_MAPPINGS_PATH, lrg_refseqgene_path=LRG_REFSEQGENE_PATH)[source]#

Provide mappings between gene symbols and RefSeq + Ensembl transcript accessions.

Uses LRG_RefSeqGene and transcript_mappings.csv, which will automatically be acquired if they aren’t already available. See the configuration section in the documentation for information about manual acquisition of data.

In general, this class’s methods expect to receive NCBI gene symbols, so users should be careful about the sourcing of their input in cases where terms are conflicted or ambiguous (which, to be fair, should be relatively rare).

__init__(transcript_file_path=TRANSCRIPT_MAPPINGS_PATH, lrg_refseqgene_path=LRG_REFSEQGENE_PATH)[source]#

Initialize the transcript mappings class.

Parameters:
  • transcript_file_path (Path) – Path to transcript mappings file

  • lrg_refseqgene_path (Path) – Path to LRG RefSeqGene file

coding_dna_transcripts(identifier)[source]#

Return transcripts from a coding dna refseq for a gene symbol.

Parameters:

identifier (str) – Gene identifier to find transcripts for

Return type:

List[str]

Returns:

cDNA transcripts for a gene symbol

get_gene_symbol_from_ensembl_protein(q)[source]#

Return the gene symbol for a Ensembl Protein.

Parameters:

q (str) – ensembl protein accession

Return type:

Optional[str]

Returns:

Gene symbol

get_gene_symbol_from_ensembl_transcript(q)[source]#

Return gene symbol for an Ensembl Transcript.

Parameters:

q (str) – Ensembl transcript accession

Return type:

Optional[str]

Returns:

Gene symbol

get_gene_symbol_from_refeq_protein(q)[source]#

Return the gene symbol for a Refseq Protein.

Parameters:

q (str) – RefSeq protein accession

Return type:

Optional[str]

Returns:

Gene symbol

get_gene_symbol_from_refseq_rna(q)[source]#

Return gene symbol for a Refseq RNA Transcript.

Parameters:

q (str) – RefSeq RNA transcript accession

Return type:

Optional[str]

Returns:

Gene symbol

protein_transcripts(identifier)[source]#

Return a list of protein transcripts for a gene symbol.

>>> from cool_seq_tool.sources import TranscriptMappings
>>> braf_txs = TranscriptMappings().protein_transcripts("BRAF")
>>> braf_txs.sort()
>>> braf_txs[-1]
'NP_004324.2'
Parameters:

identifier (str) – Gene identifier to get protein transcripts for

Return type:

List[str]

Returns:

Protein transcripts for a gene symbol