Version 1.11.10 (also included in 2.0.1)
---------------
* Fix detection of cores and hyperthreads on Mac OS X.
* Serialize pciaccess discovery to fix concurrent topology loads in
multiple threads.
* Fix first touch area memory binding on Linux when thread memory
binding is different.
* Some minor fixes to memory binding.
* Fix hwloc-dump-hwdata to only process SMBIOS information that correspond
to the KNL and KNM configuration.
* Add a heuristic for guessing KNL/KNM memory and cluster modes when
hwloc-dump-hwdata could not run as root earlier.
* Fix discovery of NVMe OS devices on Linux >= 4.0.
* Add get_area_memlocation() on Windows.
* Add CPUVendor, Model, ... attributes on Mac OS X.
Version 1.11.9
--------------
* Add support for Zhaoxin ZX-C and ZX-D processors in the x86 backend,
thanks to Jeff Zhao for the patch.
* Fix AMD Epyc 24-core L3 cache locality in the x86 backend.
* Don't crash in the x86 backend when the CPUID vendor string is unknown.
* Fix the missing pu discovery support bit on some OS.
* Fix the management of the lstopoStyle info attribute for custom colors.
* Add verbose warnings when failing to load hwloc v2.0+ XMLs.
Version 1.11.8
--------------
* Multiple Solaris improvements, thanks to Maureen Chew for the help:
+ Detect caches on Sparc.
+ Properly detect allowed/disallowed PUs and NUMA nodes with processor sets.
+ Add hwloc_get_last_cpu_location() support for the current thread.
* Add support for CUDA compute capability 7.0 and fix support for 6.[12].
* Tools improvements
+ Fix search for objects by physical index in command-line tools.
+ Add missing "cpubind:get_thisthread_last_cpu_location" in the output
of hwloc-info --support.
+ Add --pid and --name to specify target processes in hwloc-ps.
+ Display thread names in lstopo and hwloc-ps on Linux.
* Doc improvements
+ Add a FAQ entry about building on Windows.
+ Install missing sub-manpage for hwloc_obj_add_info() and
hwloc_obj_get_info_by_name().
Performing substitutions during post-patch breaks tools such as mkpatches,
making it very difficult to regenerate correct patches after making changes,
and often leading to substituted string replacements being committed.
paexec:
- add new option -0. It works just like in "xargs -0".
- add new option -J.
- add new option -mw=.
- fix help message display by -h.
- -md= now allows no delimiter mode in -g mode.
- -c and -C override each other if one is implied after another.
Add new tool "paargs". It is a wrapper over paexec(1) that
simplifies use of paexec.
Fix transport_broken_rnd test script.
This fixes regression test on Solaris.
Update man page for paexec(1).
SLURM is an open-source resource manager designed for Linux clusters of
all sizes. It provides three key functions. First it allocates exclusive
and/or non-exclusive access to resources (computer nodes) to users for some
duration of time so they can perform work. Second, it provides a framework
for starting, executing, and monitoring work (typically a parallel job) on
a set of allocated nodes. Finally, it arbitrates contention for resources
by managing a queue of pending work.
Renamed from parallel/slurm.
OK wiz@
Changes since 2.6.4:
Adds additional capabilities such as SQL accounting and job profiling
Change maintainer to bacon@NetBSD.org
Install example Linux init scripts
The actual fix as been done by "pkglint -F */*/buildlink3.mk", and was
reviewed manually.
There are some .include lines that still are indented with zero spaces
although the surrounding .if is indented. This is existing practice.
1.10.7:
- Fix bug in TCP BTL that impacted performance on 10GbE (and faster)
networks by not adjusting the TCP send/recv buffer sizes and using
system default values
- Add missing MPI_AINT_ADD and MPI_AINT_DIFF function delcarations in
mpif.h
- Fixed time reported by MPI_WTIME; it was previously reported as
dependent upon the CPU frequency.
- Fix platform detection on FreeBSD
- Fix a bug in the handling of MPI_TYPE_CREATE_DARRAY in
MPI_(R)(GET_)ACCUMULATE
- Fix openib memory registration limit calculation
- Add missing MPI_T_PVAR_SESSION_NULL in mpi.h
- Fix "make distcheck" when using external hwloc and/or libevent packages
- Add latest ConnectX-5 vendor part id to OpenIB device params
- Fix race condition in the UCX PML
- Fix signal handling for rsh launcher
- Fix Fortran compilation errors by removing MPI_SIZEOF in the Fortran
interfaces when the compiler does not support it
- Fixes for the pre-ignore-TKR "mpi" Fortran module implementation
(i.e., for older Fortran compilers -- these problems did not exist
in the "mpi" module implementation for modern Fortran compilers):
- Add PMPI_* interfaces
- Fix typo in MPI_FILE_WRITE_AT_ALL_BEGIN interface name
- Fix typo in MPI_FILE_READ_ORDERED_BEGIN interface name
- Fixed the type of MPI_DISPLACEMENT_CURRENT in all Fortran interfaces
to be an INTEGER(KIND=MPI_OFFSET_KIND).
- Fixed typos in MPI_INFO_GET_* man pages. Thanks to Nicolas Joly for
the patch
- Fix typo bugs in wrapper compiler script
Unsorted entries in PLIST files have generated a pkglint warning for at
least 12 years. Somewhat more recently, pkglint has learned to sort
PLIST files automatically. Since pkglint 5.4.23, the sorting is only
done in obvious, simple cases. These have been applied by running:
pkglint -Cnone,PLIST -Wnone,plist-sort -r -F
This has been a pkglint warning for several years now, and pkglint can even
fix it automatically. And it did for this commit.
Only in lang/mercury, two passes of autofixing were necessary because there
were nested variables.
- Adds Process._authkey alias to .authkey for 2.7 compat.
- Remove superfluous else clause from max_memory_per_child_check.
- Document and test all supported Python versions.
- Extend 'Process' to be compatible with < Py3.5.
- Use a properly initialized logger in pool.py error logging.
- _trywaitkill can now kill a whole process group if the worker process declares itself as a group leader.
- Fix cpython issue 14881 (See http://bugs.python.org/issue14881).
- Fix for a crash on windows.
- Fix messaging in case of worker exceeds max memory.
* Added support for MPI-3.1 features including nonblocking collective I/O,
address manipulation routines, thread-safety for MPI initialization,
pre-init functionality, and new MPI_T routines to look up variables
by name.
* Fortran 2008 bindings are enabled by default and fully supported.
* Added support for the Mellanox MXM InfiniBand interface. (thanks
to Mellanox for the code contribution).
* Added support for the Mellanox HCOLL interface for collectives.
(thanks to Mellanox for the code contribution).
* Significant stability improvements to the MPICH/portals4
implementation.
* Completely revamped RMA infrastructure including several
scalability improvements, performance improvements, and bug fixes.
* Added experimental support for Open Fabrics Interfaces (OFI) version 1.0.0.
https://github.com/ofiwg/libfabric (thanks to Intel for code contribution)
* The Myrinet MX network module, which had a life cyle from 1.1 till
3.1.2, has now been deleted.
* Several other minor bug fixes, memory leak fixes, and code cleanup.
--------------
* Fix hwloc-bind --membind for CPU-less NUMA nodes (again).
Thanks to Gilles Gouaillardet for reporting the issue.
* Fix a memory leak on IBM S/390 platforms running Linux.
* Fix a memory leak when forcing the x86 backend first on amd64/topoext
platforms running Linux.
* Command-line tools now support "hbm" instead "numanode" for filtering
only high-bandwidth memory nodes when selecting locations.
+ hwloc-bind also support --hbm and --no-hbm for filtering only or
no HBM nodes.
* Add --children and --descendants to hwloc-info for listing object
children or object descendants of a specific type.
* Add --no-index, --index, --no-attrs, --attrs to disable/enable display
of index numbers or attributes in the graphical lstopo output.
* Try to gather hwloc-dump-hwdata output from all possible locations
in hwloc-gather-topology.
* Updates to the documentation of locations in hwloc(7) and
command-line tools manpages.
- max_memory_per_child was measured in kilobytes on Linux, but bytes on
*BSD/MacOS, it's now always kilobytes.
- Windows: Adds support for max_memory_per_child, but requires the
``psutil`` package to be installed.
- Fixed bug in ForkingPickler.loadbuf, where it tried to pass
a BytesIO instance directly to ``pickle.loads`` on Python 2.7.
MASTER_SITES= site1 \
site2
style continuation lines to be simple repeated
MASTER_SITES+= site1
MASTER_SITES+= site2
lines. As previewed on tech-pkg. With thanks to rillig for fixing pkglint
accordingly.
Prompted by Nicolas Joly in private mail.
1.10.4 - 01 Sept 2016
------
- Fix assembler support for MIPS
- Improve memory handling for temp buffers in collectives
- Fix [all]reduce with non-zero lower bound datatypes
Thanks Hristo Iliev for the report
- Fix non-standard ddt handling. Thanks Yuki Matsumoto for the report
- Various libnbc fixes. Thanks Yuki Matsumoto for the report
- Fix typos in request RMA bindings for Fortran. Thanks to @alazzaro
and @vondele for the assist
- Various bug fixes and enhancements to collective support
- Fix predefined types mapping in hcoll
- Revive the coll/sync component to resolve unexpected message issues
during tight loops across collectives
- Fix typo in wrapper compiler for Fortran static builds
1.10.3 - 15 June 2016
------
- Fix zero-length datatypes. Thanks to Wei-keng Liao for reporting
the issue.
- Minor manpage cleanups
- Implement atomic support in OSHMEM/UCX
- Fix support of MPI_COMBINER_RESIZED. Thanks to James Ramsey
for the report
- Fix computation of #cpus when --use-hwthread-cpus is used
- Add entry points for Allgatherv, iAllgatherv, Reduce, and iReduce
for the HCOLL library
- Fix an HCOLL integration bug that could signal completion of request
while still being worked
- Fix computation of cores when SMT is enabled. Thanks to Ben Menadue
for the report
- Various USNIC fixes
- Create a datafile in the per-proc directory in order to make it
unique per communicator. Thanks to Peter Wind for the report
- Fix zero-size malloc in one-sided pt-to-pt code. Thanks to Lisandro
Dalcin for the report
- Fix MPI_Get_address when passed MPI_BOTTOM to not return an error.
Thanks to Lisandro Dalcin for the report
- Fix MPI_TYPE_SET_ATTR with NULL value. Thanks to Lisandro Dalcin for
the report
- Fix various Fortran08 binding issues
- Fix memchecker no-data case. Thanks to Clinton Stimpson for the report
- Fix CUDA support under OS-X
- Fix various OFI/MTL integration issues
- Add MPI_T man pages
- Fix one-sided pt-to-pt issue by preventing communication from happening
before a target enters a fence, even in the no-precede case
- Fix a bug that disabled Totalview for MPMD use-case
- Correctly support MPI_UNWEIGHTED in topo-graph-neighbors. Thanks to
Jun Kudo for the report
- Fix singleton operations under SLURM when PMI2 is enabled
- Do not use MPI_IN_PLACE in neighborhood collectives for non-blocking
collectives (libnbc). Thanks to Jun Kudo for the report
- Silence autogen deprecation warnings for newer versions of Perl
- Do not return MPI_ERR_PENDING from collectives
- Use type int* for MPI_WIN_DISP_UNIT, MPI_WIN_CREATE_FLAVOR, and MPI_WIN_MODEL.
Thanks to Alastair McKinstry for the report
- Fix register_datarep stub function in IO/OMPIO. Thanks to Eric
Chamberland for the report
- Fix a bus error on MPI_WIN_[POST,START] in the shared memory one-sided component
- Add several missing MPI_WIN_FLAVOR constants to the Fortran support
- Enable connecting processes from different subnets using the openib BTL
- Fix bug in basic/barrier algorithm in OSHMEM
- Correct process binding for the --map-by node case
- Include support for subnet-to-subnet routing over InfiniBand networks
- Fix usnic resource check
- AUTHORS: Fix an errant reference to Subversion IDs
- Fix affinity for MPMD jobs running under LSF
- Fix many Fortran binding bugs
- Fix `MPI_IN_PLACE`-related bugs
- Fix PSM/PSM2 support for singleton operations
- Ensure MPI transports continue to progress during RTE barriers
- Update HWLOC to 1.9.1 end-of-series
- Fix a bug in the Java command line parser when the
-Djava.library.path options was given by the user
- Update the MTL/OFI provider selection behavior
- Add support for clock_gettime on Linux.
- Correctly detect and configure for Solaris Studio 12.5
beta compilers
- Correctly compute #slots when -host is used for MPMD case
- Fix a bug in the hcoll collectives due to an uninitialized field
- Do not set a binding policy when oversubscribing a node
- Fix hang in intercommunicator operations when oversubscribed
- Speed up process termination during MPI_Abort
- Disable backtrace support by default in the PSM/PSM2 libraries to
prevent unintentional conflicting behavior.
1.10.2: 26 Jan 2016
-------------------
**********************************************************************
* OSHMEM is now 1.2 compliant
**********************************************************************
- Fix NBC_Copy for legitimate zero-size messages
- Fix multiple bugs in OSHMEM
- Correctly handle mpirun --host <user>@<ip-address>
- Centralize two MCA params to avoid duplication between OMPI and
OSHMEM layers: opal_abort_delay and opal_abort_print_stack
- Add support for Fujitsu compilers
- Add UCX support for OMPI and OSHMEM
- Correctly handle oversubscription when not given directives
to permit it. Thanks to @ammore1 for reporting it
- Fix rpm spec file to not include the /usr directory
- Add Intel HFI1 default parameters for the openib BTL
- Resolve symbol conflicts in the PSM2 library
- Add ability to empty the rgpusm cache when full if requested
- Fix another libtool bug when -L requires a space between it
and the path. Thanks to Eric Schnetter for the patch.
- Add support for OSHMEM v1.2 APIs
- Improve efficiency of oshmem_preconnect_all algorithm
- Fix bug in buffered sends support
- Fix double free in edge case of mpirun. Thanks to @jsharpe for
the patch
- Multiple one-sided support fixes
- Fix integer overflow in the tuned "reduce" collective when
using buffers larger than INT_MAX in size
- Fix parse of user environment variables in mpirun. Thanks to
Stefano Garzarella for the patch
- Performance improvements in PSM2 support
- Fix NBS iBarrier for inter-communicators
- Fix bug in vader BTL during finalize
- Improved configure support for Fortran compilers
- Fix rank_file mapper to support default --slot-set. Thanks
to Matt Thompson for reporting it
- Update MPI_Testsome man page. Thanks to Eric Schnetter for
the suggestion
- Fix missing resize of the returned type for subarray and
darray types. Thanks to Keith Bennett and Dan Garmann for
reporting it
- Fix Java support on OSX 10.11. Thanks to Alexander Daryin
for reporting the problem
- Fix some compilation issues on Solaris 11.2. Thanks to
Paul Hargrove for his continued help in such areas
Version 1.11.4
--------------
* Add MemoryMode and ClusterMode attributes in the Machine object on KNL.
Add doc/examples/get-knl-modes.c for an example of retrieving them.
Thanks to Grzegorz Andrejczuk.
* Fix Linux build with -m32 with respect to libudev.
Thanks to Paul Hargrove for reporting the issue.
* Fix build with Visual Studio 2015, thanks to Eloi Gaudry for reporting
the issue and providing the patch.
* Don't forget to display OS device children in the graphical lstopo.
* Fix a memory leak on Solaris, thanks to Bryon Gloden for the patch.
* Properly handle realloc() failures, thanks to Bryon Gloden for reporting
the issue.
* Fix lstopo crash in ascii/fig/windows outputs when some objects have a
lstopoStyle info attribute.
Version 1.11.3
--------------
* Bug fixes
+ Fix a memory leak on Linux S/390 hosts with books.
+ Fix /proc/mounts parsing on Linux by using mntent.h.
Thanks to Nathan Hjelm for reporting the issue.
+ Fix a x86 infinite loop on VMware due to the x2APIC feature being
advertised without actually being fully supported.
Thanks to Jianjun Wen for reporting the problem and testing the patch.
+ Fix the return value of hwloc_alloc() on mmap() failure.
Thanks to Hugo Brunie for reporting the issue.
+ Fix the return value of command-line tools in some error cases.
+ Do not break individual thread bindings during x86 backend discovery in a
multithreaded process. Thanks to Farouk Mansouri for the report.
+ Fix hwloc-bind --membind for CPU-less NUMA nodes.
+ Fix some corner cases in the XML export/import of application userdata.
* API Improvements
+ Add HWLOC_MEMBIND_BYNODESET flag so that membind() functions accept
either cpusets or nodesets.
+ Add hwloc_get_area_memlocation() to check where pages are actually
allocated. Only implemented on Linux for now.
- There's no _nodeset() variant, but the new flag HWLOC_MEMBIND_BYNODESET
is supported.
+ Make hwloc_obj_type_sscanf() parse back everything that may be outputted
by hwloc_obj_type_snprintf().
* Detection Improvements
+ Allow the x86 backend to add missing cache levels, so that it completes
what the Solaris backend lacks.
Thanks to Ryan Zezeski for reporting the issue.
+ Do not filter-out FibreChannel PCI adapters by default anymore.
Thanks to Matt Muggeridge for the report.
+ Add support for CUDA compute capability 6.x.
* Tools
+ Add --support to hwloc-info to list supported features, just like with
hwloc_topology_get_support().
- Also add --objects and --topology to explicitly switch between the
default modes.
+ Add --tid to let hwloc-bind operate on individual threads on Linux.
+ Add --nodeset to let hwloc-bind report memory binding as NUMA node sets.
+ hwloc-annotate and lstopo don't drop application userdata from XMLs anymore.
- Add --cu to hwloc-annotate to drop these application userdata.
+ Make the hwloc-dump-hwdata dump directory configurable through configure
options such as --runstatedir or --localstatedir.
* Misc Improvements
+ Add systemd service template contrib/systemd/hwloc-dump-hwdata.service
for launching hwloc-dump-hwdata at boot on Linux.
Thanks to Grzegorz Andrejczuk.
+ Add HWLOC_PLUGINS_BLACKLIST environment variable to prevent some plugins
from being loaded. Thanks to Alexandre Denis for the suggestion.
+ Small improvements for various Windows build systems,
thanks to Jonathan L Peyton and Marco Atzeri.
Version 1.11.2
--------------
* Improve support for Intel Knights Landing Xeon Phi on Linux:
+ Group local NUMA nodes of normal memory (DDR) and high-bandwidth memory
(MCDRAM) together through "Cluster" groups so that the local MCDRAM is
easy to find.
- See "How do I find the local MCDRAM NUMA node on Intel Knights
Landing Xeon Phi?" in the documentation.
- For uniformity across all KNL configurations, always have a NUMA node
object even if the host is UMA.
+ Fix the detection of the memory-side cache:
- Add the hwloc-dump-hwdata superuser utility to dump SMBIOS information
into /var/run/hwloc/ as root during boot, and load this dumped
information from the hwloc library at runtime.
- See "Why do I need hwloc-dump-hwdata for caches on Intel Knights
Landing Xeon Phi?" in the documentation.
Thanks to Grzegorz Andrejczuk for the patches and for the help.
* The x86 and linux backends may now be combined for discovering CPUs
through x86 CPUID and memory from the Linux kernel.
This is useful for working around buggy CPU information reported by Linux
(for instance the AMD Bulldozer/Piledriver bug below).
Combination is enabled by passing HWLOC_COMPONENTS=x86 in the environment.
* Fix L3 cache sharing on AMD Opteron 63xx (Piledriver) and 62xx (Bulldozer)
in the x86 backend. Thanks to many users who helped.
* Fix the overzealous L3 cache sharing fix added to the x86 backend in 1.11.1
for AMD Opteron 61xx (Magny-Cours) processors.
* The x86 backend may now add the info attribute Inclusive=0 or 1 to caches
it discovers, or to caches discovered by other backends earlier.
Thanks to Guillaume Beauchamp for the patch.
* Fix the management on alloc_membind() allocation failures on AIX, HP-UX
and OSF/Tru64.
* Fix spurious failures to load with ENOMEM on AIX in case of Misc objects
below PUs.
* lstopo improvements in X11 and Windows graphical mode:
+ Add + - f 1 shortcuts to manually zoom-in, zoom-out, reset the scale,
or fit the entire window.
+ Display all keyboard shortcuts in the console.
* Debug messages may be disabled at runtime by passing HWLOC_DEBUG_VERBOSE=0
in the environment when --enable-debug was passed to configure.
* Add a FAQ entry "What are these Group objects in my topology?".