VSEARCH supports de novo and reference based chimera detection,
clustering, full-length and prefix dereplication, rereplication,
reverse complementation, masking, all-vs-all pairwise global
alignment, exact and global alignment searching, shuffling,
subsampling and sorting. It also supports FASTQ file analysis,
filtering, conversion and merging of paired-end reads.
The aim of this project is to create an alternative to the USEARCH
tool developed by Robert C. Edgar (2010).
fastp is a tool designed to provide fast all-in-one preprocessing for FastQ
files. This tool is developed in C++ with multithreading supported to afford
high performance.
Upstream build does not use LDFLAGS canonically.
Makefile.in will require restructuring to eliminate workaround.
This patch fixes build on CentOS and build with RELRO on NetBSD.
Also add LICENSE and fig2dev runtime dependency.
HISAT2 is a fast and sensitive alignment program for mapping next-generation
sequencing reads (both DNA and RNA) to a population of human genomes (as well
as to a single reference genome).
Canu is a fork of the Celera Assembler, designed for high-noise single-molecule
sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).
Canu is a hierarchical assembly pipeline which runs in four steps:
Detect overlaps in high-noise sequences using MHAP
Generate corrected sequence consensus
Trim corrected sequences
Assemble trimmed corrected sequences
Stacks is a software pipeline for building loci from short-read sequences, such
as those generated on the Illumina platform. Stacks was developed to work with
restriction enzyme-based data, such as RAD-seq, for the purpose of building
genetic maps and conducting population genomics and phylogeography.
Kallisto is a program for quantifying abundances of transcripts from RNA-Seq
data, or more generally of target sequences using high-throughput sequencing
reads. It is based on the novel idea of pseudoalignment for rapidly determining
the compatibility of reads with targets, without the need for alignment.
From Eric A. Borisch in pull request NetBSD/pkgsrc#40.
- Do not derive Dataset from dict (#767)
- fixes side effects from initializing with another dataset
- Added missing dict methods that are passed to the tags dict
- Adapted documentation to Dataset changes
- Make sure that the retry order config is reset in the test (#772)
BCFtools is a program for variant calling and manipulating files in the Variant
Call Format (VCF) and its binary counterpart BCF. All commands work
transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
In order to avoid tedious repetion, throughout this document we will use "VCF"
and "BCF" interchangeably, unless specifically noted.
Most commands accept VCF, bgzipped VCF and BCF with filetype detected
automatically even when streaming from a pipe. Indexed VCF and BCF work in all
situations. Unindexed VCF and BCF and streams work in most, but not all
situations. In general, whenever multiple VCFs are read simultaneously, they
must be indexed and therefore also compressed.
Release 1.9:
* Samtools mpileup VCF and BCF output is now deprecated. It is still
functional, but will warn. Please use bcftools mpileup instead.
* Samtools mpileup now handles the '-d' max_depth option differently. There
is no longer an enforced minimum, and '-d 0' is interpreted as limitless
(no maximum - warning this may be slow). The default per-file depth is
now 8000, which matches the value mpileup used to use when processing
a single sample. To get the previous default behaviour use the higher
of 8000 divided by the number of samples across all input files, or 250.
* Samtools stats new features:
- The '--remove-overlaps' option discounts overlapping portions of
templates when computing coverage and mapped base counting.
- When a target file is in use, the number of bases inside the
target is printed and the percentage of target bases with coverage
above a given threshold specified by the '--cov-threshold' option.
- Split base composition and length statistics by first and last reads.
* Samtools faidx new features:
- Now takes long options.
- Now warns about zero-length and truncated sequences due to the
requested range being beyond the end of the sequence.
- Gets a new option (--continue) that allows it to carry on
when a requested sequence was not in the index.
- It is now possible to supply the list of regions to output in a text
file using the new '--region-file' option.
- New '-i' option to make faidx return the reverse complement of
the regions requested.
- faidx now works on FASTQ (returning FASTA) and added a new
fqidx command to index and return FASTQ.
* Samtools collate now has a fast option '-f' that only operates on
primary pairs, dropping secondary and supplementary. It tries to write
pairs to the final output file as soon as both reads have been found.
* Samtools bedcov gets a new '-j' option to make it ignore deletions (D) and
reference skips (N) when computing coverage.
* Small speed up to samtools coordinate sort, by converting it to use
radix sort.
* Samtools idxstats now works on SAM and CRAM files, however this
isn't fast due to some information lacking from indices.
* Compression levels may now be specified with the level=N
output-fmt-option. E.g. with -O bam,level=3.
* Various documentation improvements.
* Bug-fixes:
- Improved error reporting in several places.
- Various test improvements.
- Fixed failures in the multi-region iterator (view -M) when regions
provided via BED files include overlaps
- Samtools stats now counts '=' and 'X' CIGAR operators when
counting mapped bases.
- Samtools stats has fixes for insert size filtering (-m, -i).
- Samtools stats -F now longer negates an earlier -d option.
- Fix samtools stats crash when using a target region.
- Samtools sort now keeps to a single thread when the -@ option is absent.
Previously it would spawn a writer thread, which could cause the CPU
usage to go slightly over 100%.
- Fixed samtools phase '-A' option which was incorrectly defined to take
a parameter.
- Fixed compilation problems when using C_INCLUDE_PATH.
- Fixed --version when built from a Git repository.
- Use noenhanced mode for title in plot-bamstats. Prevents unwanted
interpretation of characters like underscore in gnuplot version 5.
- blast2sam.pl now reports perfect match hits (no indels or mismatches).
- Fixed bug in fasta and fastq subcommands where stdout would not be flushed
correctly if the -0 option was used.
- Fixed invalid memory access in mpileup and depth on alignment records
where the sequence is absent.
1.9:
If ./configure fails, make will stop working until either configure is re-run successfully, or make distclean is used. This makes configuration failures more obvious.
The default SAM version has been changed to 1.6. This is in line with the latest version specification and indicates that HTSlib supports the CG tag used to store long CIGAR data in BAM format.
bgzip integrity check option '--test'
Faidx can now index fastq files as well as fasta. The fastq index adds an extra column to the .fai index which gives the offset to the quality values. New interfaces have been added to htslib/faidx.h to read the fastq index and retrieve the quality values. It is possible to open a fastq index as if fasta (only sequences will be returned), but not the other way round.
New API interfaces to add or update integer, float and array aux tags.
Add level=<number> option to hts_set_opt() to allow the compression level to be set. Setting level=0 enables uncompressed output.
Improved bgzip error reporting.
Better error reporting when CRAM reference files can't be opened.
Fixes to make tests work properly on Windows/MinGW - mainly to handle line ending differences.
Efficiency improvements:
Small speed-up for CRAM indexing.
Reduce the number of unnecessary wake-ups in the thread pool.
Avoid some memory copies when writing data, notably for uncompressed BGZF output.
Bug fixes:
Fix multi-region iterator bugs on CRAM files.
Fixed multi-region iterator bug that caused some reads to be skipped incorrectly when reading BAM files.
Fixed synced_bcf_reader() bug when reading contigs multiple times.
Fixed bug where bcf_hdr_set_samples() did not update the sample dictionary when removing samples.
Fixed bug where the VCF record ref length was calculated incorrectly if an INFO END tag was present. (71b00a)
Fixed warnings found when compiling with gcc 8.1.0.
sam_hdr_read() and sam_hdr_write() will now return an error code if passed a NULL file pointer, instead of crashing.
Fixed possible negative array look-up in sam_parse1() that somehow escaped previous fuzz testing.
Fixed bug where cram range queries could incorrectly report an error when using multiple threads.
Fixed very rare rANS normalisation bug that could cause an assertion failure when writing CRAM files.
Pydicom is a pure Python package for working with DICOM files such as medical
images, reports, and radiotherapy objects.
Pydicom makes it easy to read these complex files into natural pythonic
structures for easy manipulation. Modified datasets can be written again to
DICOM format files.
Packaged by Eric A. Borisch via NetBSD/pkgsrc#37, thank you Eric!
Trimmomatic performs a variety of useful trimming tasks for illumina
paired-end and single ended data. The selection of trimming steps and their
associated parameters are supplied on the command line. It works with FASTQ
(using phred + 33 or phred + 64 quality scores, depending on the Illumina
pipeline used), either uncompressed or gzipp'ed FASTQ.
Performing substitutions during post-patch breaks tools such as mkpatches,
making it very difficult to regenerate correct patches after making changes,
and often leading to substituted string replacements being committed.