Usage#
Cool-Seq-Tool provides easy access to, and useful operations on, a selection of important genomic resources. Modules are divided into three groups:
Data sources, for basic acquisition and setup for a data source via Python
Data handlers, for additional operations on top of existing sources
Data mappers, for functions that incorporate multiple sources/handlers to produce output
The core CoolSeqTool class encapsulates all of their functions and can be used for easy initialization and access:
>>> from cool_seq_tool import CoolSeqTool
>>> cst = CoolSeqTool()
>>> cst.seqrepo_access.translate_alias("NM_002529.3")[0][-1]
'ga4gh:SQ.RSkww1aYmsMiWbNdNnOTnVDAM3ZWp1uA'
>>> cst.transcript_mappings.ensembl_protein_for_gene_symbol["BRAF"][0]
'ENSP00000419060'
>>> await cst.uta_db.get_ac_from_gene("BRAF")
['NC_000007.14', 'NC_000007.13']
Descriptions and examples of functions can be found in the API Reference section.
Note
Many component classes in Cool-Seq-Tool, including UtaDatabase, ExonGenomicCoordsMapper, and ManeTranscript, define public methods as async. This means that, when used inside another function, they must be called with await:
from cool_seq_tool import CoolSeqTool
async def do_thing():
mane_mapper = CoolSeqTool().mane_transcript
result = mane_mapper.g_to_grch38("NC_000001.11", 100, 200)
print(type(result))
# <class 'coroutine'>
awaited_result = await result
print(awaited_result)
# {'ac': 'NC_000001.11', 'pos': (100, 200)}
In a REPL, asyncio.run() can be used to call coroutines outside of functions. Many of our docstring examples will use this pattern.
>>> import asyncio
>>> from cool_seq_tool import cool_seq_tool
>>> mane_mapper = CoolSeqTool().mane_transcript
>>> result = asyncio.run(mane_mapper.g_to_grch38("NC_000001.11", 100, 200))
>>> print(result)
{'ac': 'NC_000001.11', 'pos': (100, 200)}
See the asyncio module documentation for more information.
Environment configuration#
Individual classes will accept arguments upon initialization to set parameters regarding data sources. In general, these parameters are also configurable via environment variables, e.g. in a cloud deployment.
Variable |
Description |
|---|---|
|
Path to LRG_RefSeqGene file. Used in |
|
Path to transcript mapping file generated from Ensembl BioMart. Used in |
|
Path to MANE Summary file. Used in |
|
Path to SeqRepo directory (i.e. contains |
|
A libpq connection URI, i.e. of the form |
|
A path to a chainfile for lifting from GRCh37 to GRCh38. Used by the |
|
A path to a chainfile for lifting from GRCh38 to GRCh37. Used by the |