Required File Preparation

Before running nanomotif for motif detection and analysis, ensure that you have prepared the necessary input files. These include a genome assembly file, a methylation pileup file, and a contig-bin relationship file.


Assembly

The assembly file should contain all contigs in FASTA format. Each header should have a unique contig identifier. The sequence should only include standard nucleotide or IUPAC characters (either upper or lower case). Nanomotif has been primarily developed and tested using assemblies generated by Flye.

Requirements:

  • Format: FASTA

  • Contains all contigs for evaluation

  • Contig ID in the FASTA header

  • IUPAC-compliant characters only


Methylation Pileup

The methylation pileup file indicates how many mapped reads at each position show evidence of methylation. Nanomotif can accept both raw pileup files and bgzipped pileup files (with a .gz extension). If using a bgzipped file, ensure that it is indexed with tabix or use epimetheusepimetheus bgzip compress. Using a bgzipped and indexed file will significantly speed up processing time.

To generate this file:

  1. Map reads (with methylation calls) to the assembly.

  2. Use modkit pileup to create the pileup.

Example commands:

MODCALLS="path/to/reads/with/methylation/calls.bam"
ASSEMBLY="path/to/assembly.fa"
MAPPING="path/to/generated/mapping.bam"
PILEUP="path/to/generated/pileup.bed"

samtools fastq -T MM,ML $MODCALLS | \
    minimap2 -ax map-ont -y $ASSEMBLY - | \
    samtools view -bS | \
    samtools sort -o $MAPPING

modkit pileup --only-tabs $MAPPING $PILEUP

epimetheus bgzip compress -i $PILEUP # --keep to not remove pileup file.

Expected format: The pileup file is a tab-delimited table where each row represents a position on a contig, including information about methylation status.

Running “head” on the pileup file should produce a table similar to the one below:

contig_3

0

1

m

133

-

0

1

255,0,0

133

0.00

0

133

0

0

6

0

0

contig_3

1

2

a

174

+

1

2

255,0,0

174

1.72

3

171

0

0

3

0

0

contig_3

2

3

a

172

+

2

3

255,0,0

172

2.33

4

168

0

0

7

0

0

contig_3

3

4

a

178

+

3

4

255,0,0

178

0.56

1

177

0

0

2

0

0

contig_3

4

5

a

177

+

4

5

255,0,0

177

2.82

5

172

0

0

5

0

0

contig_3

5

6

a

179

+

5

6

255,0,0

179

2.79

5

174

0

0

3

2

0

contig_3

5

6

m

1

+

5

6

255,0,0

1

0.00

0

1

0

0

3

180

0

contig_3

5

6

a

1

-

5

6

255,0,0

1

0.00

0

1

0

0

0

156

0

contig_3

6

7

m

183

+

6

7

255,0,0

183

0.55

1

182

0

0

1

0

0

contig_3

6

7

a

4

-

6

7

255,0,0

4

0.00

0

4

0

0

0

153

0

Considerations:

  • Use untrimmed reads for mapping to avoid downstream errors.

  • Running modkit pileup with default parameters may set a low methylation threshold and introduce noise. A filter-threshold of 0.7 is recommended to reduce noise and improve motif detection quality.


Contig-Bin Relationship

For analyses that require binning, you need contig-bin relationship. This maps each contig to its corresponding bin, which is essential for binning-based motif discovery.

This informaiton can be passed in one of three ways:

  1. Contig-Bin File: A file that explicitly maps contigs to bins

  2. Bin FASTA Files: A directory of bin FASTA files, where each file corresponds to a bin

  3. List of bin FASTAs: A list of bin FASTA files, where each file corresponds to a bin

Createing a Contig-Bin File

This file links each contig to its corresponding bin. It is a tab-separated file with two columns and no header:

  • Column 1: Contig ID

  • Column 2: Bin ID

If you have a folder of bin FASTA files (one file per bin), you can generate the contig-bin file by extracting contig IDs and their associated bin filenames, then formatting this information into a two-column TSV.

BINS="/path/to/bins/fasta"    # Bins directory
BIN_EXT="fa"                  # Bins file extension
OUT="contig_bin.tsv"          # contig-bin output destination

grep ">" ${BINS}/*.${BIN_EXT} | \
        sed "s/.*\///" | \
        sed "s/.${BIN_EXT}:>/\t/" | \
        awk -F'\t' '{print $2 "\t" $1}' > $OUT

Example output:

contig_1

bin1

contig_2

bin1

contig_3

bin1

contig_4

bin2

contig_5

bin2

contig_6

bin3

contig_7

bin3

contig_8

bin3

contig_9

bin3

contig_10

bin1