Skip to the content.

Home

Synthetic Assay (Private)

Generate multi-sample synthetic genome files of a given number of record and record lengths.

Two files are generated: a sequence file and a label file.

The sequence file consist of randomly generated sequences of DNA bases of a given length.

A given motif pattern (short patterns of bases) is embedded into a random subset of rows at random locations in the rows according to a uniform distribution.

The label file labels the rows in the sequence file.

The label file consists of single column rows of 0’s and 1’s (binary labels), where a 1 indicates the presence of the motif and 0 indicates it’s absence in the corresponding rows in the sequence file.

The presence of the motif pattern is taken to indicate the sequence exhibits a protein binding property.

Key Capbilities:

Status:

Functional

Context:

Part of Project Expo