cool_seq_tool.resources.data_files#

Fetch data files regarding transcript mapping and annotation.

class cool_seq_tool.resources.data_files.DataFile(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Constrain legal values for file resource fetching in get_data_file().

LRG_REFSEQGENE = 'lrg_refseqgene'[source]#
MANE_REFSEQ_GENOMIC = 'mane_refseq_genomic'[source]#
MANE_SUMMARY = 'mane_summary'[source]#
TRANSCRIPT_MAPPINGS = 'transcript_mappings'[source]#
lower()[source]#

Return lower-cased value

Return type:

str

Returns:

lower case string

cool_seq_tool.resources.data_files.get_data_file(resource, from_local=False)[source]#

Acquire Cool-Seq-Tool file dependency.

Each resource can be defined using an environment variable:

  • Resource.TRANSCRIPT_MAPPINGS -> TRANSCRIPT_MAPPINGS_PATH

  • Resource.MANE_SUMMARY -> MANE_SUMMARY_PATH

  • Resource.MANE_REFSEQ_GENOMIC -> MANE_REFSEQ_GENOMIC_PATH

  • Resource.LRG_REFSEQGENE -> LRG_REFSEQGENE_PATH

Otherwise, this function falls back on default expected locations:

  • transcript_mappings.tsv is bundled with this library.

  • LRG RefseqGene and MANE summary files are acquired from NCBI using the wags-tails if unavailable locally, or out of date.

Parameters:
  • resource (DataFile) – resource to fetch

  • from_local (bool) – if True, don’t check for or acquire latest version – just provide most recent locally available file and raise FileNotFoundError otherwise

Return type:

Path

Returns:

path to file. Consuming functions can assume that it exists and is a file.

Raises:
  • FileNotFoundError – if file location configured by env var doesn’t exist

  • ValueError – if file location configured by env var isn’t a file