Canu is a fork of the Celera Assembler, designed for high-noise single-molecule
sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION).
Canu is a hierarchical assembly pipeline which runs in four steps:
Detect overlaps in high-noise sequences using MHAP
Generate corrected sequence consensus
Trim corrected sequences
Assemble trimmed corrected sequences
Stacks is a software pipeline for building loci from short-read sequences, such
as those generated on the Illumina platform. Stacks was developed to work with
restriction enzyme-based data, such as RAD-seq, for the purpose of building
genetic maps and conducting population genomics and phylogeography.
Kallisto is a program for quantifying abundances of transcripts from RNA-Seq
data, or more generally of target sequences using high-throughput sequencing
reads. It is based on the novel idea of pseudoalignment for rapidly determining
the compatibility of reads with targets, without the need for alignment.
From Eric A. Borisch in pull request NetBSD/pkgsrc#40.
- Do not derive Dataset from dict (#767)
- fixes side effects from initializing with another dataset
- Added missing dict methods that are passed to the tags dict
- Adapted documentation to Dataset changes
- Make sure that the retry order config is reset in the test (#772)
BCFtools is a program for variant calling and manipulating files in the Variant
Call Format (VCF) and its binary counterpart BCF. All commands work
transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
In order to avoid tedious repetion, throughout this document we will use "VCF"
and "BCF" interchangeably, unless specifically noted.
Most commands accept VCF, bgzipped VCF and BCF with filetype detected
automatically even when streaming from a pipe. Indexed VCF and BCF work in all
situations. Unindexed VCF and BCF and streams work in most, but not all
situations. In general, whenever multiple VCFs are read simultaneously, they
must be indexed and therefore also compressed.
Release 1.9:
* Samtools mpileup VCF and BCF output is now deprecated. It is still
functional, but will warn. Please use bcftools mpileup instead.
* Samtools mpileup now handles the '-d' max_depth option differently. There
is no longer an enforced minimum, and '-d 0' is interpreted as limitless
(no maximum - warning this may be slow). The default per-file depth is
now 8000, which matches the value mpileup used to use when processing
a single sample. To get the previous default behaviour use the higher
of 8000 divided by the number of samples across all input files, or 250.
* Samtools stats new features:
- The '--remove-overlaps' option discounts overlapping portions of
templates when computing coverage and mapped base counting.
- When a target file is in use, the number of bases inside the
target is printed and the percentage of target bases with coverage
above a given threshold specified by the '--cov-threshold' option.
- Split base composition and length statistics by first and last reads.
* Samtools faidx new features:
- Now takes long options.
- Now warns about zero-length and truncated sequences due to the
requested range being beyond the end of the sequence.
- Gets a new option (--continue) that allows it to carry on
when a requested sequence was not in the index.
- It is now possible to supply the list of regions to output in a text
file using the new '--region-file' option.
- New '-i' option to make faidx return the reverse complement of
the regions requested.
- faidx now works on FASTQ (returning FASTA) and added a new
fqidx command to index and return FASTQ.
* Samtools collate now has a fast option '-f' that only operates on
primary pairs, dropping secondary and supplementary. It tries to write
pairs to the final output file as soon as both reads have been found.
* Samtools bedcov gets a new '-j' option to make it ignore deletions (D) and
reference skips (N) when computing coverage.
* Small speed up to samtools coordinate sort, by converting it to use
radix sort.
* Samtools idxstats now works on SAM and CRAM files, however this
isn't fast due to some information lacking from indices.
* Compression levels may now be specified with the level=N
output-fmt-option. E.g. with -O bam,level=3.
* Various documentation improvements.
* Bug-fixes:
- Improved error reporting in several places.
- Various test improvements.
- Fixed failures in the multi-region iterator (view -M) when regions
provided via BED files include overlaps
- Samtools stats now counts '=' and 'X' CIGAR operators when
counting mapped bases.
- Samtools stats has fixes for insert size filtering (-m, -i).
- Samtools stats -F now longer negates an earlier -d option.
- Fix samtools stats crash when using a target region.
- Samtools sort now keeps to a single thread when the -@ option is absent.
Previously it would spawn a writer thread, which could cause the CPU
usage to go slightly over 100%.
- Fixed samtools phase '-A' option which was incorrectly defined to take
a parameter.
- Fixed compilation problems when using C_INCLUDE_PATH.
- Fixed --version when built from a Git repository.
- Use noenhanced mode for title in plot-bamstats. Prevents unwanted
interpretation of characters like underscore in gnuplot version 5.
- blast2sam.pl now reports perfect match hits (no indels or mismatches).
- Fixed bug in fasta and fastq subcommands where stdout would not be flushed
correctly if the -0 option was used.
- Fixed invalid memory access in mpileup and depth on alignment records
where the sequence is absent.
1.9:
If ./configure fails, make will stop working until either configure is re-run successfully, or make distclean is used. This makes configuration failures more obvious.
The default SAM version has been changed to 1.6. This is in line with the latest version specification and indicates that HTSlib supports the CG tag used to store long CIGAR data in BAM format.
bgzip integrity check option '--test'
Faidx can now index fastq files as well as fasta. The fastq index adds an extra column to the .fai index which gives the offset to the quality values. New interfaces have been added to htslib/faidx.h to read the fastq index and retrieve the quality values. It is possible to open a fastq index as if fasta (only sequences will be returned), but not the other way round.
New API interfaces to add or update integer, float and array aux tags.
Add level=<number> option to hts_set_opt() to allow the compression level to be set. Setting level=0 enables uncompressed output.
Improved bgzip error reporting.
Better error reporting when CRAM reference files can't be opened.
Fixes to make tests work properly on Windows/MinGW - mainly to handle line ending differences.
Efficiency improvements:
Small speed-up for CRAM indexing.
Reduce the number of unnecessary wake-ups in the thread pool.
Avoid some memory copies when writing data, notably for uncompressed BGZF output.
Bug fixes:
Fix multi-region iterator bugs on CRAM files.
Fixed multi-region iterator bug that caused some reads to be skipped incorrectly when reading BAM files.
Fixed synced_bcf_reader() bug when reading contigs multiple times.
Fixed bug where bcf_hdr_set_samples() did not update the sample dictionary when removing samples.
Fixed bug where the VCF record ref length was calculated incorrectly if an INFO END tag was present. (71b00a)
Fixed warnings found when compiling with gcc 8.1.0.
sam_hdr_read() and sam_hdr_write() will now return an error code if passed a NULL file pointer, instead of crashing.
Fixed possible negative array look-up in sam_parse1() that somehow escaped previous fuzz testing.
Fixed bug where cram range queries could incorrectly report an error when using multiple threads.
Fixed very rare rANS normalisation bug that could cause an assertion failure when writing CRAM files.
Pydicom is a pure Python package for working with DICOM files such as medical
images, reports, and radiotherapy objects.
Pydicom makes it easy to read these complex files into natural pythonic
structures for easy manipulation. Modified datasets can be written again to
DICOM format files.
Packaged by Eric A. Borisch via NetBSD/pkgsrc#37, thank you Eric!
Trimmomatic performs a variety of useful trimming tasks for illumina
paired-end and single ended data. The selection of trimming steps and their
associated parameters are supplied on the command line. It works with FASTQ
(using phred + 33 or phred + 64 quality scores, depending on the Illumina
pipeline used), either uncompressed or gzipp'ed FASTQ.
Performing substitutions during post-patch breaks tools such as mkpatches,
making it very difficult to regenerate correct patches after making changes,
and often leading to substituted string replacements being committed.
Samtools implements various utilities for post-processing alignments in the
SAM, BAM, and CRAM formats, including indexing, variant calling (in conjunction
with bcftools), and a simple alignment viewer.
OK wiz@
HTSlib is an implementation of a unified C library for accessing common file
formats, such as SAM, CRAM, VCF, and BCF, used for high-throughput sequencing
data. It is the core library used by samtools and bcftools.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity
between sequences. The program compares nucleotide or protein sequences to
sequence databases and calculates the statistical significance of matches.
BLAST can be used to infer functional and evolutionary relationships between
sequences as well as help identify members of gene families.
OK wiz@
The actual fix as been done by "pkglint -F */*/buildlink3.mk", and was
reviewed manually.
There are some .include lines that still are indented with zero spaces
although the surrounding .if is indented. This is existing practice.
Unsorted entries in PLIST files have generated a pkglint warning for at
least 12 years. Somewhat more recently, pkglint has learned to sort
PLIST files automatically. Since pkglint 5.4.23, the sorting is only
done in obvious, simple cases. These have been applied by running:
pkglint -Cnone,PLIST -Wnone,plist-sort -r -F
This has been a pkglint warning for several years now, and pkglint can even
fix it automatically. And it did for this commit.
Only in lang/mercury, two passes of autofixing were necessary because there
were nested variables.
1.72 2016-09-02 06:50:53-05:00 America/Chicago
* Full release (no changes from 1.71 beyond version)
1.71 2016-09-01 22:57:25-05:00 America/Chicago (TRIAL RELEASE)
* Minor bump for impending BioPerl 1.7 release.
* #2 : 'Unescaped left brace in regex is deprecated' with newer versions of perl
fixed [fjossandon]
Add p5-Text-Diff as test dependency.
1.7.2 - "Entebbe"
[Bugs]
* #247 - Omit unnecessary parent_id attribute added by GFF3Loader [nathanweeks]
* #245 - Code coverage fixes [zmughal,cjfields]
* #237 - Fix warning in Bio::DB::IndexedBase [willmclaren,bosborne]
* #238 - Use a Travis cron job for network tests [zmughal,cjfields]
* #218 - Bio::DB::Flat::BinarySearch should use _fh() instead of fh() as fh() does not take arguments in [thibauthourlier,bosborne]
* #227 - Bio::SeqIO Ignores first line of sequence [VAR121,bosborne]
* #223 - Use Travis Perl helper script and enable coverage [zmughal,cjfields]
* #222 - Fix test RemoteDB/Taxonomy.t: requires networking [zmughal,cjfields]
* #216 - Apply carsonhh's patch (Inline::C fixes) [carsonh,bosborne]
* #213 - Support FTS5 in Bio::DB::SeqFeature::Store::DBI::SQLite [nathanweeks,bosborne]
* #210 - Sorting qualifiers while write embl files [hdevillers,cjfields]
* #209 - Fixed bug in _toDsspKey() [jvolkening,hlapp]
[Code changes]
* PAML-related code from bioperl and bioperl-run are now in a separate distribution on CPAN [carandraug]
MASTER_SITES= site1 \
site2
style continuation lines to be simple repeated
MASTER_SITES+= site1
MASTER_SITES+= site2
lines. As previewed on tech-pkg. With thanks to rillig for fixing pkglint
accordingly.
Upstream changes:
1.7.1 - "Election"
[Bugs]
* Minor release to incorporate fix for CPAN indexing, which
prevented proper updates [cjfields]
* Fix problem in managing Target attribute for gff3 [Jukes34]
* Minor bug fixes related to NCBI HTTPS support [cjfields]
1.7.0 - "Disney"
[New site]
* We have migrated to Github Pages. This was actually planned, but the
recent OBF server compromise forced our hand.
Brian Osborne [bosborne] took this under his wing to move docs and has
done a tremendous amount of work formatting the site and working out some
of the idiosyncracies with the new Jekyll-based design. Mark Jensen, Paul
Cantalupo and Franscison Ossandon also helped. Kudos!!
* Similarly, the official issue tracker is now Github Issues. This has
been updated in the relevant documentation bits (we hope!)
[Code changes]
* Previously deprecated modules removed
* Bio::Tools::Infernal, Bio::Tools::ERPIN, Bio::Tools::RNAMotif
* Bio::DB::SeqHound has been removed due to the service no longer being
available
* Bio::Tools::Analysis::Protein::Mitoprot has been removed for security
reasons due to the server no longer having a valid cert
* Bio::EUtilities, Bio::Biblio are now separate releases on CPAN
* Bio::Coordinate, Bio::SearchIO::blastxml,
Bio::SearchIO::Writer::BSMLResultWriter are now separate releases to be
added on CPAN
[New features]
* Docker instances of tagged releases are available! [hlapp]
* NCBI HTTPS support [mjohnson and others]
* Bio::SearchIO::infernal
- Issue #131: added CMSEARCH parsing support for Infernal 1.1 [pcantalupo]
* Bio::Search::HSP::ModelHSP
- Added a 'noncanonical_string' method to retrieve the NC line from CMSEARCH
reports [pcantalupo]
* Bio::Search::Result::INFERNALResult
- Added new module to represent features of Infernal reports [pcantalupo]
* Bio::DB::Taxonomy SQLite option [cjfields]
* WrapperBase quoted option values [majensen]
* Various documentation fixes and updates [bosborne]
[Bug Fixes]
* Fixes in Bio::Root::Build to deal with META.json/yml for CPAN indexing [cjfields]
* Bio::SeqFeature::Generic spliced_seq() bug fix [Eric Snyder, via bosborne]
* NeXML parser fixes [fjossandon]
* Bug fix for Bio::DB::SeqFeature memory adapter [lstein]
* RT 103272 : SeqFeature database deletion skipped features with a decimal -
Joshua Fortriede (Xenbase)
* RT 98374: AlignIO issues with sequence names not correctly parsing - Xiaoyu Zhuo
* Issue #70: CONTIG parsing in GenBank output fixed [fjossandon]
* Issue #76: Circular genome fixes with Bio::Location::Split [fjossandon]
* Issue #80: Fix lack of caching issue with Bio::DB::Taxonomy [fjossandon]
* Issue #81: Small updates to make sure possible memory leaks are detected [cjfields]
* Issue #84: EMBL format wrapping problem [nyamned]
* Issue #90: Missing entries for translation tables 24 and 25 [fjossandon]
* Issue #95: Speed up of Bio::DB::Fasta::subseq by using a compiled regex
or compiled C code (when Inline::C is installed) [rocky]
* Fix various Bio::Tools::Analysis remote server config problems [cjfields]
* Added several missing 'Data::Stag' and 'LWP::UserAgent' requirements [fjossandon]
* Added a workaround in Bio::DB::Registry to get Username in Windows [fjossandon]
* For HMMer report parsing, changed "$hsp->bits" to return 0 instead of undef
to be consistent with "$hit->bits" behaviour [fjossandon]
* Fixed a bug in HMMer3 parsing, where an homology line ending in CS or RF
aminoacids made "next_seq" confused and broke the parser [fjossandon]
* Adjusted FTLocationFactory.pm to comply with current GenBank Feature Table
Definition, so now "join(complement(C..D),complement(A..B))" is equivalent
to "complement(join(A..B,C..D))" [fjossandon]
* For the many many many fixes that weren't mentioned - blame the release guy!
1. Compile C code with the C compiler, not the fortran compiler.
2. Use f2c, not g95, as the fortran compiler.
XXX This package builds only with f2c, not g95.
XXX There does not appear to be any way to specify this other
XXX than by abusively setting PKGSRC_FORTRAN. So do that for now.
Existing SHA1 digests verified, all found to be the same on the
machine holding the existing distfiles (morden). Existing SHA1
digests retained for now as an audit trail.
Packaged in pkgsrc-wip by Jason Bacon.
BWA is a software package for mapping low-divergent sequences against a large
reference genome, such as the human genome.
Gabedit is a graphical user interface to computational chemistry
packages like Gamess-US, Gaussian, Molcas, Molpro, MPQC,
OpenMopac, Orca, PCGamess and Q-Chem.
It can display a variety of calculation results including
support for most major molecular file formats.
The advanced "Molecule Builder" allows to rapidly sketch in
molecules and examine them in 3D. Graphics can be exported to
various formats, including animations.
Major features
* Gabedit can create input file for GAMESS(US), GAUSSIAN,
MOLCAS, MOLPRO , MPQC, OpenMopac, Orca, PCGamess and Q-Chem.
* Gabedit can graphically display a variety of Gamess-US,
Gaussian, Molcas, Molpro, MPQC, OpenMopac, Orca, PCGamess,
Q-Chem, (partially) ErgoSCF and (partially) ADF calculation
results, including the following:
+ Molecular orbitals.
+ Surfaces from the electron density, electrostatic
potential, NMR shielding density, and other properties.
+ Surfaces may be displayed in solid, translucent and wire
mesh modes. they are can be colorcoded by a separate property.
+ Contours (colorcoded), Planes colorcoded, Dipole. XYZ axes
and the principal axes of the molecule.
+ Animation of the normal modes corresponding to vibrational
frequencies.
+ Animation of the rotation of geometry, surfaces, contours,
planes colorcoded, xyz and the principal axes of the molecule.
+ Animation of contours, Animation of planes colorcoded.
* Gabedit can display UV-Vis, IR and Raman computed spectra.
* Gabedit can generate a povray file for geometry (including
hydrogen's bond),surfaces (including colorcoded surfaces),
contours, planes colorcoded.
* Gabedit can save picture in BMP, JPEG, PNG, PPM and PS format.
* Gabedit can generate automatically a series of pictures
for animation (vibration, geometry convergence, rotation, contours,
planes colorcoded).
* Simulated Annealing with Molecular Dynamics is implemented in Gabedit
(using Amber 99 molecular mechanics parameters).
{perl>=5.16.6,p5-ExtUtils-ParseXS>=3.15}:../../devel/p5-ExtUtils-ParseXS
since pkgsrc enforces the newest perl version anyway, so they
should always pick perl, but sometimes (pkg_add) don't due to the
design of the {,} syntax.
No effective change for the above reason.
Ok joerg
(ChangeLog)
2012-08-20 Paolo Tosco <paolo.tosco@unito.it>
* src/formats/mol2format.cpp: added a check for N.4 nitrogens
(fixes PR#3557898)
2012-06-09 Paolo Tosco <paolo.tosco@unito.it>
* src/kekulize.cpp: reverted the r4862 patch to kekulize.cpp;
the incorrect aromaticity perception of oxonium salts concerned
only the MOL2 format, so the fix was applied to mol2format.cpp
instead
* src/formats/mol2format.cpp: added a check to improve downstream
aromaticity perception on charged molecules containing oxygen
2012-06-07 Paolo Tosco <paolo.tosco@unito.it>
* include/openbabel/atom.h: added protos for CountFreeSulfurs() and
IsThiocarboxylSulfur() functions which are equivalent to
CountFreeOxygens() and IsCarboxylOxygen() and address
(di)thiocarboxyl groups
* src/atom.cpp: added the CountFreeSulfurs() and
IsThiocarboxylSulfur() functions
* src/forcefields/forcefieldmmff94.cpp: added some additional
checks to make MMFF94 atom type assignment more robust
* src/formats/mol2format.cpp: added some checks to improve downstream
aromaticity perception on charged molecules containing nitrogen,
oxygen and sulfur
* src/kekulize.cpp: added a check to fix incorrect perception of
aromatic oxonium and thionium cations
(NEWS)
Open Babel 2.3.1 (2011-10-14)
This release represents a major bug-fix release and is a stable
upgrade, strongly recommended for all users of Open Babel. Many bugs
and enhancements have been added since the 2.3.0 release.
Citation:
Please consider citing this work if you publish work which used Open Babel:
Noel M. O'Boyle , Michael Banck , Craig A. James , Chris Morley , Tim
Vandermeersch and Geoffrey R. Hutchison. "Open Babel: An open
chemical toolbox." Journal of Cheminformatics 2011, 3:33.
http://dx.doi.org/10.1186/1758-2946-3-33