jellyfish-manual-1.1

更新时间:2023-05-31 09:28:15 阅读: 评论:0

Jellyfish:A fast k-mer counter
G.Marcais and C.Kingsford
February17,2012
Version1.1.4
Abstract
Jellyfish is a software to count k-mers in DNA quences.
1Synopsis
jellyfish count[-o prefix][-m merlength][-t threads][-s hashsize][--both-strands]] jellyfish
jellyfish dump hash
jellyfish stats hash
jellyfish histo[-h high][-l low][-i increment]hash
jellyfish query hash
jellyfish cite
Plus equivalent version for Quake mode:qhisto,qdump and qmerge.
2Description
Jellyfish is a k-mer counter bad on a multi-threaded hash table implementation.
2.1Counting and merging
To count k-mers,u a command like:
jellyfish count-m22-o output-c3-s1*******-t32input.fasta This will count the the22-mers in input.fasta with32threads.The counterfield in the hash us only3bits and the hash has at least10million entries.
The outputfiles will be named output0,output1,etc.(the prefix is specified with the-o switch).If the hash is large enough(has specified by the-s switch)tofit all the k-mers,there will be only one outputfile named output0.If the hashfilled up before all the mers were read,the hash is dumped to dis
k,zeroed out and reading in mers resumes.Multiple intermediaryfiles will be prent on the disks,named output0,output1,etc.
To obtain correct results from the other sub-commands(such as histo,stats,etc.),the multiple outputfiles,if any,need to be merged into one with the merge command.For example with the following command:
ejbjellyfish(1)1Version:1.1.4,February17,2012
2.2Orientation2DESCRIPTION
jellyfish merge-o output.jf output\_*
Should you get many intermediary outputfiles(say hundreds),the size of the hash table is too small.Rerunning Jellyfish with a larger size(option-s)is probably faster than merging all the intermediaryfiles.
2.2Orientation
When the orientation of the quences in the input fastafile is not in quencing reads, using--both-strands(-C)makes the most n.
For any k-mer m,its canonical reprentation is m itlf or its rever-complement,whichever comesfirst lexicographically.With the option-C,only the canonical reprentation of the mers are stored in the hash and the count value is the number of occurrences of both the mer and its rever-complement.
2.3Choosing the hash size
To achieve the best performance,a minimum number of intermediaryfiles should be written to disk.So the parameter-s should be chon tofit as many k-mers as possible(ideally all of them) while stillfitting in memory.
We consider to examples:counting mers in quencing reads and in afinished genome.
First,suppo we count k-mers in short quencing reads:there are n reads and there is an average of1error per reads where each error generates k unique mers.If the genome size is G,the size of the hash(option-s)tofit all k-mers at once is estimated to:(G+k∗n)/0.8.The division by0.8compensates for the maximum usage of approximately80%of the hash table.
On the other hand,when counting k-mers in an asmbled quence of length G,tting-s to G is appropriate.
As a matter of convenience,Jellyfish understands ISO suffixes for the size of the hash.Hence ’-s10M’stands10million entries while’-s50G’stands for50billion entries.
The actual memory usage of the hash table can be computed as follow.The actual size of the hash will be rounded up to the next power of2:s=2l.The parameter r is such that the maximum reprobe value(-p)plus one is less than2r.Then the memory usage per entry in the hash is(in bits, not bytes)2k−l+r+1.The total memory usage of the hash table in bytes is:2l∗(2k−l+r+1)/8.
2.4Choosing the countingfield size
To save space,the hash table supports variable length a k-mer occurring only a few times will u a small counter,a k-mer occurring many times will ud multiple entries in the hash. The-c specify the length(in bits)of the small counter.The trade offis as follows:a low value will save space per entry in the hash but can potentially increa the number of entries ud,hence maybe requiring a larger hash.
In practice,u a value for-c so that most of you k-mers require only1entry.For example,to count k-mers in a genome,where most of the quence is unique,u-c1or-c2.For quencing reads,u a value for-c large enough to counts up to twice the coverage.For example,if the coverage is10X,choo
a counter length of4(-c4)as24>10.
jellyfish(1)2Version:1.1.4,February17,2012
3SUBCOMMANDS AND OPTIONS
3Subcommands and options
南昌出国留学中介3.1count
Usage:jellyfish count[options]file:path+
Count k-mers or qmers in fasta or fastqfiles
Options(default value in(),*required):
-m,--mer-len=uint32*Length of mer
-s,--size=uint64*Hash size
雅思听力机经词汇
-t,--threads=uint32Number of threads(1)
-
google 英文o,--output=string Output prefix(mer counts)
-c,--counter-len=Length in bits Length of countingfield(7)
--out-counter-len=Length in bytes Length of counterfield in output(4)teco
-C,--both-strands Count both strand,canonical reprentation(fal)
-p,--reprobes=uint32Maximum number of reprobes(62)2020年世界读书日主题
-r,--raw Write raw databa(fal)
-q,--quake Quake compatibility mode(fal)
--quality-start=uint32Starting ASCII for quality values(64)
--min-quality=uint32Minimum quality.A ba with lesr quality becomes an N(0) -L,--lower-count=uint64Don’t output k-mer with count¡lower-count
-U,--upper-count=uint64Don’t output k-mer with count¿upper-count
--matrix=Matrixfile Hash function binary matrix
--timing=Timingfile Print timing information
keep什么意思--stats=Statsfile Print stats
--usage Usage
-h,--help This message
--full-help Detailed help
-V,--version Version
jellyfish(1)3Version:1.1.4,February17,2012
3.2stats
Usage:jellyfish stats[options]db:path
Statistics
Display some statistics about the k-mers in the hash:
Unique:Number of k-mers which occur only once.Distinct:Number of k-mers,not counting multiplicity.Total:Number of k-mers,including multiplicity.Max count:Maximum number of occurrence of a k-mer.
Options(default value in(),*required):
-L,--lower-count=uint64Don’t consider k-mer with count¡lower-count
-U,--upper-count=uint64Don’t consider k-mer with count¿upper-count
-v,--verbo Verbo(fal)
-o,--output=c string Outputfile
--usage Usage
-h,--help This message
--full-help Detailed help
-V,--version Version
3.3histo
Usage:jellyfish histo[options]db:pathbbclearningenglish
Create an histogram of k-mer occurrences
Create an histogram with the number of k-mers having a given count.In bucket’i’are tallied the k-mers which have a count’c’satisfying’low+i*inc¡=c¡low+(i+1)*inc’.Buckets in the output are labeled by the low end point(low+i*inc).
The last bucket in the output behaves as a catchall:it tallies all k-mers with a count greater or equal to the low end point of this bucket.
Options(default value in(),*required):
-l,--low=uint64Low count value of histogram(1)
-h,--high=uint64High count value of histogram(10000)
-i,--increment=uint64Increment value for buckets(1)
-
t,--threads=uint32Number of threads(1)expressionism
-f,--full Full histo.Don’t skip count0.(fal)
-o,--output=c string Outputfile
-v,--verbo Output information(fal)
--usage Usage
jellyfish(1)4Version:1.1.4,February17,2012
--help This message
--full-help Detailed help
-V,--version Version
3.4dump
Usage:jellyfish stats[options]db:path
Dump k-mer counts
By default,dump in a fasta format where the header is the count and the quence is the quence of the k-mer.The column format is a2column output:k-mer count.
Options(default value in(),*required):
-c,--column Column format(fal)
-t,--tab Tab parator(fal)
-L,--lower-count=uint64Don’t output k-mer with count¡lower-count
-U,--upper-count=uint64Don’t output k-mer with count¿upper-count
-o,--output=c string Outputfile
--usage Usage
-h,--help This message
-V,--version Version
3.5merge
Usage:jellyfish merge[options]input:c string+
Merge jellyfish databas
Options(default value in(),*required):
-s,--buffer-size=Buffer length Length in bytes of input buffer(10000000)
-o,--output=string Outputfile(mer counts merged.jf)
--out-counter-len=uint32Length(in bytes)of countingfield in output(4)
--out-buffer-size=uint64Size of output buffer per thread(10000000)
-v,--verbo Be verbo(fal)
--usage Usage
-h,--help This message
-
V,--version Version
jellyfish(1)5Version:1.1.4,February17,2012
3.6query
Usage:jellyfish query[options]db:path
Query from a compacted databa
Query a hash.It reads k-mers from the standard input and write the counts on the standard output.
Options(default value in(),*required):
-C,--both-strands Both strands(fal)
-c,--cary-bit Valuefield as the cary bit information(fal)
-i,--input=file Inputfile
-o,--output=file Outputfile
-
-usage Usage
-h,--help This message
-V,--version Version
3.7cite
Usage:jellyfish cite[options]
How to cite Jellyfish’s paper
Citation of paper
Options(default value in(),*required):
-b,--bibtex Bibtex format(fal)
-o,--output=c string Outputfile
--usage Usage
-
h,--help This message
-V,--version Version
3.8qhisto
Usage:jellyfish qhisto[options]db:c string
软件学习班Create an histogram of k-mer occurences
Options(default value in(),*required):
-l,--low=double Low count value of histogram(0.0)
-h,--high=double High count value of histogram(10000.0)
-i,--increment=double Increment value for buckets(1.0)
-f,--full Full histo.Don’t skip count0.(fal)
jellyfish(1)6Version:1.1.4,February17,2012

本文发布于:2023-05-31 09:28:15,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/129137.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:留学   词汇   读书   软件   南昌   机经   主题   学习班
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图