Code Examples
=============

Counting microRNAs
++++++++++++++++++

Requires you provide at least one fastq file and have a short read aligner installed. You should also
specify the species being mapped to::

    import smallrnaseq as smrna
    res = smrna.map_mirbase(files=['test_1.fastq','test_2.fastq'], overwrite=True, aligner='bowtie',
                        species='hsa', pad5=3, pad3=5)

Counting isomiRs
++++++++++++++++

This method is used to count isomiRs using results from previously mapped reads. So a sam file is
required::

    smrna.count_isomirs(samfile, truecounts, species='bta')

Mapping to the genome
+++++++++++++++++++++

This requires a reference genome and a gtf file with miRNA features::

    featcounts = srseq.map_genome_features(['test_1.fastq'], 'bos_taurus', gtffile,
                                        outpath='ncrna_map', aligner='subread', merge=True)

Novel miRNA prediction
++++++++++++++++++++++

The built-in method for novel prediction should be considered a somewhat 'quick and dirty'
method at present but is relatively fast and convenient to use. The basic idea is to take
clusters of reads that could be mature sequence and find suitable precursors. Structural
features of each precursor are then scored using a classifier. The best candidate is selected
is there is at least one. We have followed a similar approach to the miRanalyzer method.

The following features are currently used in our algorithm, most are the same as those used
in sRNAbench (miRanalyzer). The diagram below may help to clarify some of the terminology used.

.. image:: https://raw.githubusercontent.com/dmnfarrell/smallrnaseq/master/img/mirna_example.png

To predict miRNAs you need to have run mapping on genome. Then use the sam file and read
counts to get the true reads and input this into the method find_mirnas with a reference
genome fasta file. The reference fasta must match the bowtie index you used for alignment::

    from smallrnaseq import novel
    import pandas as pd
    #single file prediction
    readcounts = pd.read_csv('countsfile.csv')
    samfile = 'mysamfile.sam'
    reads = utils.get_aligned_reads(samfile, readcounts)
    new = novel.find_mirnas(reads, ref_fasta)

Differential Expression
+++++++++++++++++++++++

Assuming we have all the raw files, they need to be adapter trimmed.
Optionally you can remove other ncrnas before counting your target rnas class,
though that may not be advisable.
The following code maps all the files to bovine mature miRNAs and counts the mapped genes,
then saves the results to a csv file which has the counts in one column per sample.
You can skip this if you already have the counts file::

    import pandas as pd
    import smallrnaseq as smrna
    from smallrnaseq import base, utils, de

    path = 'pathtodata'
    base.BOWTIE_INDEXES = 'bowtie_index'
    refs = ['mirbase-bta'] #name of bowtie index

    files = glob.glob(path+'/*.fastq')
    outpath = 'ncrna_map'
    #map to selected annotation files
    counts = smrna.map_rnas(files, refs, outpath, overwrite=True)
    R = smrna.pivot_count_data(counts, idxcols=['name','db'])
    R.to_csv('mirna_counts.csv',index=False)