Synthetic Assay (Private)
Generate multi-sample synthetic genome files of a given number of record and record lengths.
Two files are generated: a sequence file and a label file.
The sequence file consist of randomly generated sequences of DNA bases of a given length.
A given motif pattern (short patterns of bases) is embedded into a random subset of rows at random locations in the rows according to a uniform distribution.
The label file labels the rows in the sequence file.
The label file consists of single column rows of 0’s and 1’s (binary labels), where a 1 indicates the presence of the motif and 0 indicates it’s absence in the corresponding rows in the sequence file.
The presence of the motif pattern is taken to indicate the sequence exhibits a protein binding property.
Key Capbilities:
-
Store files in cloud object storage.
-
Catalog files in Watson Knowledge Catalog.
-
Use ICOS segmented upload capability to support extremely large files
Status:
Functional
Context:
Part of Project Expo