* Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
bug report and patch, and to Graham Dennis for the bug report).
* Use 128-bit ARM NEON instructions instead of 64-bits. This change
appears to speed up even ARM processors with a 64-bit NEON pipe.
* Speed improvements for single-precision AVX.
* Speed up planner on machines without "official" cycle counters, such as ARM.
FFTW 3.3.2
* Removed an archaic stack-alignment hack that was failing with
gcc-4.7/i386.
* Added stack-alignment hack necessary for gcc on Windows/i386. We
will regret this in ten years (see previous change).
* Fix incompatibility with Intel icc which pretends to be gcc
but does not support quad precision.
* make libfftw{threads,mpi} depend upon libfftw when using libtool;
this is consistent with most other libraries and simplifies the life
of various distributors of GNU/Linux.
* Reduced planning time in estimate mode for sizes with large prime factors.
* Added AVX autodetection under Visual Studio.
* Modern Fortran interface now uses a separate fftw3l.f03 interface file for
the long double interface, which is not supported by some Fortran compilers.
Provided new fftw3q.f03 interface file to access the quadruple-precision FFTW
routines with recent versions of gcc/gfortran.
* Added support for the NEON extensions to the ARM ISA.
* MPI code now compiles even if mpicc is a C++ compiler.
* Compiling OpenMP support (--enable-openmp) now installs a fftw3_omp library,
instead of fftw3_threads, so that OpenMP and POSIX threads (--enable-threads)
libraries can be built and installed at the same time.
* Various minor compilation fixes, corrections of manual typos, and
improvements to the benchmark test program.
* Add support for the AVX extensions to x86 and x86-64. The AVX code works with
16-byte alignment (as opposed to 32-byte alignment), so there is no ABI
change compared to FFTW 3.2.2.
* Added Fortran 2003 interface, which should be usable on most modern Fortran
compilers (e.g. gfortran) and provides type-checked access to the the C FFTW
interface. (The legacy Fortran-77 interface is still included also.)
* Added MPI distributed-memory transforms. Compared to 3.3alpha, the major
changes in the MPI transforms are:
* Fixed some deadlock and crashing bugs.
* Added Fortran 2003 interface.
* Added new-array execute functions for MPI plans.
* Eliminated use of large MPI tags, since Cray MPI requires tags < 224.
* Expanded documentation.
* make check now runs MPI tests
* Some ABI changes — not binary-compatible with 3.3alpha MPI.
* Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
x86-64, and Itanium). The new routines use the fftwq_ prefix.
* Temporarily removed MIPS paired-single support due to lack of available
hardware for testing. We hope to add it back before the final FFTW 3.3
release; meanwhile, users who want this functionality should continue using
FFTW 3.2.x.
* Removed support for the Cell Broadband Engine. Cell users should use FFTW
3.2.x.
* New convenience functions fftw_alloc_real and fftw_alloc_complex to use
fftw_malloc for real and complex arrays without typecasts or sizeof.
FFTW 3.2.2
* Improve performance of some copy operations of complex arrays on
x86 machines.
* Add configure flag to disable alloca(), which is broken in mingw64.
* Planning in FFTW_ESTIMATE mode for r2r transforms became slower
between fftw-3.1.3 and 3.2. This regression has now been fixed.
This changes the buildlink3.mk files to use an include guard for the
recursive include. The use of BUILDLINK_DEPTH, BUILDLINK_DEPENDS,
BUILDLINK_PACKAGES and BUILDLINK_ORDER is handled by a single new
variable BUILDLINK_TREE. Each buildlink3.mk file adds a pair of
enter/exit marker, which can be used to reconstruct the tree and
to determine first level includes. Avoiding := for large variables
(BUILDLINK_ORDER) speeds up parse time as += has linear complexity.
The include guard reduces system time by avoiding reading files over and
over again. For complex packages this reduces both %user and %sys time to
half of the former time.
* Performance improvements for some multidimensional r2c/c2r transforms;
thanks to Eugene Miloslavsky for his benchmark reports.
* Compile with icc on MacOS X, use better icc compiler flags.
* Compilation fixes for systems where snprintf is defined as a macro;
thanks to Marcus Mae for the bug report.
* Fortran documentation now recommends not using dfftw_execute,
because of reports of problems with various Fortran compilers;
it is better to use dfftw_execute_dft etcetera.
* Some documentation clarifications, e.g. of fact that --enable-openmp
and --enable-threads are mutually exclusive (thanks to Long To),
and document slightly odd behavior of plan_guru_r2r in Fortran.
* FAQ was accidentally omitted from 3.2 tarball.
* Remove some extraneous (harmless) files accidentally included in
a subdirectory of the 3.2 tarball.
* Worked around apparent glibc bug that leads to rare hangs when freeing
semaphores.
* Fixed segfault due to unaligned access in certain obscure problems
that use SSE and multiple threads.
* MPI transforms not included, as they are still in alpha; the alpha
versions of the MPI transforms have been moved to FFTW 3.3alpha1.
* Performance improvements for sizes with factors of 5 and 10.
* Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Emmenlauer and Phil Dumont.
* Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
* Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
for the suggestions.
* Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
* Fixed incorrect type prefix in MPI code that prevented wisdom routines
from working in single precision (thanks to Eric A. Borisch for the report).
* Added 'make check' for MPI code (which still fails in a couple corner
cases, but should be much better than in alpha2).
* Many other small fixes.
* Bug fix: FFTW computes incorrect results when the user plans both
REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
by incorrect sharing of twiddle-factor tables between the two
transforms, and only occurs when both are used. Thanks to Paul
A. Valiant for the bug report.
and add a new helper target and script, "show-buildlink3", that outputs
a listing of the buildlink3.mk files included as well as the depth at
which they are included.
For example, "make show-buildlink3" in fonts/Xft2 displays:
zlib
fontconfig
iconv
zlib
freetype2
expat
freetype2
Xrender
renderproto
* Correct bug in configure script: --enable-portable-binary option was ignored!
* Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
either if we are using gcc.
* Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
and suggest a workaround. configure script now detects Core/Duo arch.
* Use -maltivec when checking for altivec.h.
* Performance improvements for Intel EMT64.
* Performance improvements for large-size transforms with SIMD.
* Cycle counter support for Intel icc and Visual C++ on x86-64.
* In fftw-wisdom tool, replaced obsolete --impatient with --measure.
* Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
* Windows DLL support for Fortran API (added missing __declspec(dllexport)).
* SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
CPUs lacking a CPUID instruction; thanks to Eric Korpela.
RECOMMENDED is removed. It becomes ABI_DEPENDS.
BUILDLINK_RECOMMENDED.foo becomes BUILDLINK_ABI_DEPENDS.foo.
BUILDLINK_DEPENDS.foo becomes BUILDLINK_API_DEPENDS.foo.
BUILDLINK_DEPENDS does not change.
IGNORE_RECOMMENDED (which defaulted to "no") becomes USE_ABI_DEPENDS
which defaults to "yes".
Added to obsolete.mk checking for IGNORE_RECOMMENDED.
I did not manually go through and fix any aesthetic tab/spacing issues.
I have tested the above patch on DragonFly building and packaging
subversion and pkglint and their many dependencies.
I have also tested USE_ABI_DEPENDS=no on my NetBSD workstation (where I
have used IGNORE_RECOMMENDED for a long time). I have been an active user
of IGNORE_RECOMMENDED since it was available.
As suggested, I removed the documentation sentences suggesting bumping for
"security" issues.
As discussed on tech-pkg.
I will commit to revbump, pkglint, pkg_install, createbuildlink separately.
Note that if you use wip, it will fail! I will commit to pkgsrc-wip
later (within day).
* Faster FFTW_ESTIMATE planner.
* New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
* "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
* Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
* Faster in-place non-square transpositions (FFTW uses these internally
for in-place FFTs, and you can also perform them explicitly using
the guru interface).
* Faster prime-size DFTs: implemented Bluestein's algorithm, as well
as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
* SIMD support for split complex arrays.
* Much faster Altivec/VMX performance.
* New fftw_set_timelimit function to specify a (rough) upper bound to the
planning time (does not affect ESTIMATE mode).
* Removed --enable-3dnow support; use --enable-k7 instead.
* FMA (fused multiply-add) version is now included in "standard" FFTW,
and is enabled with --enable-fma (the default on PowerPC and Itanium).
* Automatic detection of native architecture flag for gcc. New
configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
for people distributing compiled binaries of FFTW (see manual).
* Automatic detection of Altivec under Linux with gcc 3.4 (so that
same binary should work on both Altivec and non-Altivec PowerPCs).
* Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Solaris/Intel.
* Various documentation clarifications.
* 64-bit clean. (Fixes a bug affecting the split guru planner on
64-bit machines, reported by David Necas.)
* Fixed Debian bug no.259612: inadvertent use of SSE instructions on
non-SSE machines (causing a crash) for --enable-sse binaries.
* Fixed bug that caused HC2R transforms to destroy the input in
certain cases, even if the user specified FFTW_PRESERVE_INPUT.
* Fixed bug where wisdom would be lost under rare circumstances,
causing excessive planning time.
* FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
* Fixed accidentally exported symbol that prohibited simultaneous
linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
* Support Win32 threads under MinGW (thanks to Alessio Massaro).
framework. The list of changes include:
* Modify compiler.mk so that "c" is always prepended to USE_LANGUAGES,
so we no longer need to say it in package Makefiles. Packages
should now append to USE_LANGUAGES instead of setting it.
* Create mk/compiler/f2c.mk which implements another pseudo-compiler
"f2c" that may be used with any C compiler backend, e.g.
PKGSRC_COMPILER= f2c ccache gcc
* Teach the various "real" compiler files, e.g., sunpro.mk, mipspro.mk,
etc., to use f2c if the native Fortran compiler isn't present.
Packages that use Fortran should now simply include the line:
USE_LANGUAGES+= fortran
in the package Makefile.
in the process. (More information on tech-pkg.)
Bump PKGREVISION and BUILDLINK_DEPENDS of all packages using libtool and
installing .la files.
Bump PKGREVISION (only) of all packages depending directly on the above
via a buildlink3 include.
All library names listed by *.la files no longer need to be listed
in the PLIST, e.g., instead of:
lib/libfoo.a
lib/libfoo.la
lib/libfoo.so
lib/libfoo.so.0
lib/libfoo.so.0.1
one simply needs:
lib/libfoo.la
and bsd.pkg.mk will automatically ensure that the additional library
names are listed in the installed package +CONTENTS file.
Also make LIBTOOLIZE_PLIST default to "yes".
* Some speed improvements in SIMD code.
* --without-cycle-counter option is removed. If no cycle counter is found,
then the estimator is always used. A --with-slow-timer option is provided
to force the use of lower-resolution timers.
* Added missing static keyword that prevented simultaneous linkage
of different-precision versions; thanks to Rasmus Larson for the bug report.
* Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
* Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
thanks to Nicolas Decoster for the patch.
* Added 'make smallcheck' target in tests/ directory, at the request of
James Treacy.
Major goals of this release:
* Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
* Complete rewrite, to make it easier to add new algorithms and transforms.
* New API, to support more general semantics.
Other enhancements:
* SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
(With special thanks to Franz Franchetti for many experimental prototypes
and to Stefan Kral for the vectorizing generator from fftwgel.)
* True in-place 1d transforms of large sizes (as well as compressed
twiddle tables for additional memory/cache savings).
* More arbitrary placement of real & imaginary data, e.g. including
interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
* Efficient prime-size transforms of real data.
* Multidimensional transforms can operate on a subset of a larger matrix,
and/or transform selected dimensions of a multidimensional array.
* By popular demand, simultaneous linking to double precision (fftw),
single precision (fftwf), and long-double precision (fftwl) versions
of FFTW is now supported.
* Cycle counters (on all modern CPUs) are exploited to speed planning.
* Efficient transforms of real even/odd arrays, a.k.a. discrete
cosine/sine transforms (types I-IV). (Currently work via pre/post
processing of real transforms, ala FFTPACK, so are not optimal.)
* DHTs (Discrete Hartley Transforms), again via post-processing
of real transforms (and thus suboptimal, for now).
* Support for linking to just those parts of FFTW that you need,
greatly reducing the size of statically linked programs when
only a limited set of transform sizes/types are required.
* Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
with a command-line tool (fftw-wisdom) to generate/update it.
* Fortran API can be used with both g77 and non-g77 compilers
simultaneously.
* Multi-threaded version has optional OpenMP support.
* Authors' good looks have greatly improved with age.