Usage#

Cool-Seq-Tool provides easy access to, and useful operations on, a selection of important genomic resources. Modules are divided into three groups:

  • Data sources, for basic acquisition and setup for a data source via Python

  • Data handlers, for additional operations on top of existing sources

  • Data mappers, for functions that incorporate multiple sources/handlers to produce output

The core CoolSeqTool class encapsulates all of their functions and can be used for easy initialization and access:

>>> from cool_seq_tool.app import CoolSeqTool
>>> cst = CoolSeqTool()
>>> cst.seqrepo_access.translate_alias("NM_002529.3")[0][-1]
'ga4gh:SQ.RSkww1aYmsMiWbNdNnOTnVDAM3ZWp1uA'
>>> cst.transcript_mappings.ensembl_protein_for_gene_symbol["BRAF"][0]
'ENSP00000419060'
>>> await cst.uta_db.get_ac_from_gene("BRAF")
['NC_000007.14', 'NC_000007.13']

Descriptions and examples of functions can be found in the API Reference section.

Note

Many component classes in CoolSeqTool, including UtaDatabase, ExonGenomicCoordsMapper, and ManeTranscript, define public methods as async. This means that, when used inside another function, they must be called with await:

from cool_seq_tool.app import CoolSeqTool

async def do_thing():
    mane_mapper = CoolSeqTool().mane_transcript
    result = mane_mapper.g_to_grch38("NC_000001.11", 100, 200)
    print(type(result))
    # <class 'coroutine'>
    awaited_result = await result
    print(awaited_result)
    # {'ac': 'NC_000001.11', 'pos': (100, 200)}

In a REPL, asyncio.run() can be used to call coroutines outside of functions. Many of our docstring examples will use this pattern.

>>> import asyncio
>>> from cool_seq_tool.app import cool_seq_tool
>>> mane_mapper = CoolSeqTool().mane_transcript
>>> result = asyncio.run(mane_mapper.g_to_grch38("NC_000001.11", 100, 200))
>>> print(result)
{'ac': 'NC_000001.11', 'pos': (100, 200)}

See the asyncio module documentation for more information.

REST server#

Core Cool-Seq-Tool functions can also be performed via a REST HTTP interface, provided via FastAPI. Use the uvicorn shell command to start a server instance:

uvicorn cool_seq_tool.api:app

By default, uvicorn serves to port 8000. Once initialized, go to http://localhost:8000/cool_seq_tool in a web browser for OpenAPI docs describing available endpoints.

REST routes are defined using the FastAPI APIRouter class, meaning that they can also be mounted to other FastAPI applications:

from fastapi import FastAPI
from cool_seq_tool.routers import mane

app = FastAPI()
app.include_router(mane.router)

Environment configuration#

Individual classes will accept arguments upon initialization to set parameters regarding data sources. In general, these parameters are also configurable via environment variables, e.g. in a cloud deployment.

Variable

Description

LRG_REFSEQGENE_PATH

Path to LRG_RefSeqGene file. Used in TranscriptMappings to provide mappings between gene symbols and RefSeq/Ensembl transcript accessions. If not defined, defaults to the most recent version (formatted as data/LRG_RefSeqGene_YYYYMMDD) within the Cool-Seq-Tool library directory. Cool-Seq-Tool will acquire this data manually if no configuration is provided.

TRANSCRIPT_MAPPINGS_PATH

Path to transcript mapping file generated from Ensembl BioMart. Used in TranscriptMappings. If not defined, uses a copy of the file that is bundled within the Cool-Seq-Tool installation. See the contributor instructions for information on manually rebuilding it. Cool-Seq-Tool will acquire this data manually if no configuration is provided.

MANE_SUMMARY_PATH

Path to MANE Summary file. Used in ManeTranscriptMappings to provide MANE transcript annotations. If not defined, defaults to the most recent version (formatted as data/MANE.GRCh38vX.X.summary.txt) within the Cool-Seq-Tool library directory.

SEQREPO_ROOT_DIR

Path to SeqRepo directory (i.e. contains aliases.sqlite3 database file, and sequences directory). Used by SeqRepoAccess <cool_seq_tool.handlers.seqrepo_access.SeqRepoAccess. If not defined, defaults to /usr/local/share/seqrepo/latest.

UTA_DB_URL

A libpq connection string, i.e. of the form postgresql://<user>:<password>@<host>:<port>/<database>/<schema>, used by the cool_seq_tool.sources.uta_database.UtaDatabase class. By default, it is set to postgresql://uta_admin:uta@localhost:5433/uta/uta_20210129b.

LIFTOVER_CHAIN_37_TO_38

A path to a chainfile for lifting from GRCh37 to GRCh38. Used by cool_seq_tool.sources.uta_database.UtaDatabase as input to agct. If not provided, agct will fetch it automatically from UCSC.

LIFTOVER_CHAIN_38_TO_37

A path to a chainfile for lifting from GRCh38 to GRCh37. Used by cool_seq_tool.sources.uta_database.UtaDatabase as input to agct. If not provided, agct will fetch it automatically from UCSC.

Schema support#

Many genomic data objects produced by Cool-Seq-Tool are structured in conformance with the Variation Representation Specification, courtesy of the VRS-Python library.