cool_seq_tool.handlers.seqrepo_access#
Wrap SeqRepo to provide additional lookup and identification methods on top of basic dereferencing functions.
- class cool_seq_tool.handlers.seqrepo_access.SeqRepoAccess(sr)[source]#
Provide a wrapper around the base SeqRepoDataProxy class from
VRS-Python
to provide additional lookup and identification methods.- ac_to_chromosome(ac)[source]#
Get chromosome for accession.
- Parameters:
ac (str) – Accession
- Return type:
Tuple
[Optional
[str
],Optional
[str
]]- Returns:
Chromosome, warning
- chromosome_to_acs(chromosome)[source]#
Get accessions for a chromosome
- Parameters:
chromosome (
str
) – Chromosome number. Must be either 1-22, X, or Y- Return type:
Tuple
[Optional
[List
[str
]],Optional
[str
]]- Returns:
Accessions for chromosome (ordered by latest assembly)
- get_fasta_file(sequence_id, outfile_path)[source]#
Retrieve FASTA file containing sequence for requested sequence ID.
>>> from pathlib import Path >>> from cool_seq_tool.handlers import SeqRepoAccess >>> from biocommons.seqrepo import SeqRepo >>> sr = SeqRepoAccess(SeqRepo("/usr/local/share/seqrepo/latest")) >>> # write to local file tpm3.fasta: >>> sr.get_fasta_file("NM_002529.3", Path("tpm3.fasta"))
FASTA file headers will include GA4GH sequence digest, Ensembl accession ID, and RefSeq accession ID.
- Parameters:
sequence_id (
str
) – accession ID, sans namespace, egNM_152263.3
outfile_path (
Path
) – path to save file to
- Return type:
None
- Returns:
None, but saves sequence data to
outfile_path
if successful- Raise:
KeyError if SeqRepo doesn’t have sequence data for the given ID
- get_reference_sequence(ac, start=None, end=None, residue_mode=ResidueMode.RESIDUE)[source]#
Get reference sequence for an accession given a start and end position. If
start
andend
are not given, returns the entire reference sequence.>>> from cool_seq_tool.handlers import SeqRepoAccess >>> from biocommons.seqrepo import SeqRepo >>> sr = SeqRepoAccess(SeqRepo("/usr/local/share/seqrepo/latest")) >>> sr.get_reference_sequence("NM_002529.3", 1, 10)[0] 'TGCAGCTGG' >>> sr.get_reference_sequence("NP_001341538.1", 1, 10)[0] 'MAALSGGGG'
- Parameters:
ac (
str
) – Accessionstart (
Optional
[int
]) – Start pos changeend (
Optional
[int
]) – End pos change. IfNone
assumes bothstart
andend
have same values, ifstart
exists.residue_mode (
ResidueMode
) – Residue mode forstart
andend
- Return type:
Tuple
[str
,Optional
[str
]]- Returns:
Sequence at position (if accession and positions actually exist, else return empty string), warning if any
- translate_alias(input_str)[source]#
Get aliases for a given input.
- Parameters:
input_str (str) – Input to get aliases for
- Return type:
Tuple
[List
[Optional
[str
]],Optional
[str
]]- Returns:
List of aliases, warning
- translate_identifier(ac, target_namespaces=None)[source]#
Return list of identifiers for accession.
>>> from cool_seq_tool.handlers import SeqRepoAccess >>> from biocommons.seqrepo import SeqRepo >>> sr = SeqRepoAccess(SeqRepo("/usr/local/share/seqrepo/latest")) >>> sr.translate_identifier("NM_002529.3")[0] ['MD5:18f0a6e3af9e1bbd8fef1948c7156012', 'NCBI:NM_002529.3', 'refseq:NM_002529.3', 'SEGUID:dEJQBkga9d9VeBHTyTbg6JEtTGQ', 'SHA1:74425006481af5df557811d3c936e0e8912d4c64', 'VMC:GS_RSkww1aYmsMiWbNdNnOTnVDAM3ZWp1uA', 'sha512t24u:RSkww1aYmsMiWbNdNnOTnVDAM3ZWp1uA', 'ga4gh:SQ.RSkww1aYmsMiWbNdNnOTnVDAM3ZWp1uA'] >>> sr.translate_identifier("NM_002529.3", "ga4gh")[0] ['ga4gh:SQ.RSkww1aYmsMiWbNdNnOTnVDAM3ZWp1uA']
- Parameters:
ac (
str
) – Identifier accessiontarget_namespace – The namespace(s) of identifier to return
- Return type:
Tuple
[List
[str
],Optional
[str
]]- Returns:
List of identifiers, warning