Version 1.6.3 is a minor bugfix release.
All users are encouraged to upgrade to v1.6.3 when possible.
Note that v1.6.3 is ABI compatible with the entire v1.5.x and
v1.6.x series, but is not ABI compatible with the v1.4.x series.
See http://www.open-mpi.org/software/ompi/versions/ for a
description of Open MPI's release methodology.
Here is a list of changes in v1.6.3 as compared to v1.6.2:
- Fix mpirun --launch-agent behavior when a prefix is specified.
Thanks to Reuti for identifying the issue.
- Fixed memchecker configury.
- Brought over some compiler warning squashes from the development trunk.
- Fix spawning from a singleton to multiple hosts when the "add-host"
MPI_Info key is used. Thanks to Brian Budge for pointing out the
problem.
- Add Mellanox ConnextIB IDs and max inline value.
- Fix rankfile when no -np is given.
- FreeBSD detection improvement. Thanks to Brooks Davis for the
patch.
- Removed TCP warnings on Windows.
- Improved collective algorithm selection for very large messages.
- Fix PSM MTL affinity settings.
- Fix issue with MPI_OP_COMMUTATIVE in the mpif.h bindings. Thanks to
Ake Sandgren for providing a patch to fix the issue.
- Fix issue with MPI_SIZEOF when using CHARACTER and LOGICAL types in
the mpi module. Thanks to Ake Sandgren for providing a patch to fix
the issue.
Don't build VampirTrace anymore, it will be introduced as separate package.
Changes in v1.6.2 as compared to v1.6.1:
- Fix issue with MX MTL. Thanks to Doug Eadline for raising the issue.
- Fix singleton MPI_COMM_SPAWN when the result job spans multiple nodes.
- Fix MXM hang, and update for latest version of MXM.
- Update to support Mellanox FCA 2.5.
- Fix startup hang for large jobs.
- Ensure MPI_TESTANY / MPI_WAITANY properly set the empty status when
count==0.
- Fix MPI_CART_SUB behavior of not copying periods to the new
communicator properly. Thanks to John Craske for the bug report.
- Add btl_openib_abort_not_enough_reg_mem MCA parameter to cause Open
MPI to abort MPI jobs if there is not enough registered memory
available on the system (vs. just printing a warning). Thanks to
Brock Palen for raising the issue.
- Minor fix to Fortran MPI_INFO_GET: only copy a value back to the
user's buffer if the flag is .TRUE.
- Fix VampirTrace compilation issue with the PGI compiler suite.
v1.5 is a major new release series. It includes many new features and
changes over the v1.4.x series. The most noticeable changes are the
addition of the lstopo-no-graphics program (which does require any
heavy external library such as Cairo) and the discovery of instruction
caches. There are also many small improvements to all backends, and
some deprecated features have been removed.
Version 1.5.0
-------------
* Backends
+ Do not limit the number of processors to 1024 on Solaris anymore.
+ Gather total machine memory on FreeBSD.
+ XML topology files do not depend on the locale anymore. Float numbers
such as NUMA distances or PCI link speeds now always use a dot as a
decimal separator.
+ Add instruction caches detection on Linux, AIX, Windows and Darwin.
+ Add get_last_cpu_location() support for the current thread on AIX.
+ Support binding on AIX when threads or processes were bound with
bindprocessor(). Thanks to Hendryk Bockelmann for reporting the issue
and testing patches, and to Farid Parpia for explaining the binding
interfaces.
+ Improve AMD topology detection in the x86 backend (for FreeBSD) using
the topoext feature.
* API
+ Increase HWLOC_API_VERSION to 0x00010500 so that API changes may be
detected at build-time.
+ Add a cache type attribute describind Data, Instruction and Unified
caches. Caches with different types but same depth (for instance L1d
and L1i) are placed on different levels.
+ Add hwloc_get_cache_type_depth() to retrieve the hwloc level depth of
of the given cache depth and type, for instance L1i or L2.
It helps disambiguating the case where hwloc_get_type_depth() returns
HWLOC_TYPE_DEPTH_MULTIPLE.
+ Instruction caches are ignored unless HWLOC_TOPOLOGY_FLAG_ICACHES is
passed to hwloc_topology_set_flags() before load.
+ Add hwloc_ibv_get_device_osdev_by_name() OpenFabrics helper in
openfabrics-verbs.h to find the hwloc OS device object corresponding to
an OpenFabrics device.
* Tools
+ Add lstopo-no-graphics, a lstopo built without graphical support to
avoid dependencies on external libraries such as Cairo and X11. When
supported, graphical outputs are only available in the original lstopo
program.
- Packagers splitting lstopo and lstopo-no-graphics into different
packages are advised to use the alternatives system so that lstopo
points to the best available binary.
+ Instruction caches are enabled in lstopo by default. User --no-icaches
to disable them.
+ Add -t/--threads to show threads in hwloc-ps.
* Removal of obsolete components
+ Remove the old cpuset interface (hwloc/cpuset.h) which is deprecated and
superseded by the bitmap API (hwloc/bitmap.h) since v1.1.
hwloc_cpuset and nodeset types are still defined, but all hwloc_cpuset_*
compatibility wrappers are now gone.
+ Remove Linux libnuma conversion helpers for the deprecated and
broken nodemask_t interface.
+ Remove support for "Proc" type name, it was superseded by "PU" in v1.0.
+ Remove hwloc-mask symlinks, it was replaced by hwloc-calc in v1.0.
* Misc
+ Fix PCIe 3.0 link speed computation.
+ Non-printable characters are dropped from strings during XML export.
+ Fix importing of escaped characters with the minimalistic XML backend.
+ Assert hwloc_is_thissystem() in several I/O related helpers.
+ Fix some memory leaks in the x86 backend for FreeBSD.
+ Minor fixes to ease native builds on Windows.
+ Limit the number of retries when operating on all threads within a
process on Linux if the list of threads is heavily getting modified.
paexec:
- Option -x was added. With its help paexec can run one command
per task. If -g is also specified, command's exit status is
analysed. Appropriate task and dependants are marked as "failed"
if it is non-zero.
- First character of -n argument must be alphanumeric, `+', `_',
`:' or `/'. Other symbols are reserved for future extentions.
- With '-n :filename' paexec reads a list of nodes from the
specified file.
- With a help of new option '-m t=<eot>' end of task string
may be specified, which is an empty line by default.
- Option -md=<delim> was added that overrides the default
delimiter (space character) between tasks in graph mode (-g).
- Output line that contains failed dependants no longer ends with
unnecessary space.
- Long options were completely removed.
paexec_reorder:
- Fix. "paexec_reorder -g" now handles correctly failed tasks'
output. One extra line after "fatal" is expected.
- Options -m was added. It does the same things as paexec's -m.
More examples of use and regression tests.
Documentation update, clean-ups and improvements.
Regression tests:
- Signals handling was fixed in.
- LC_ALL is always set to C in regression tests, this fixes some
problems in internationalized environment.
mk-configure>=0.23.0 is required at build time
* Changes in SLURM 2.4.1
========================
-- Fix bug for job state change from 2.3 -> 2.4 job state can now be preserved
correctly when transitioning. This also applies for 2.4.0 -> 2.4.1, no
state will be lost. (Thanks to Carles Fenoy)
* Changes in SLURM 2.4.0
========================
-- Cray - Improve support for zero compute note resource allocations.
Partition used can now be configured with no nodes nodes.
-- BGQ - make it so srun -i<taskid> works correctly.
-- Fix parse_uint32/16 to complain if a non-digit is given.
-- Add SUBMITHOST to job state passed to Moab vial sched/wiki2. Patch by Jon
Bringhurst (LANL).
-- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without
compiling with --enable-debug
-- Modify scontrol to require "-dd" option to report batch job's script. Patch
from Don Albert, Bull.
-- Modify SchedulerParamters option to match documentation: "bf_res="
changed to "bf_resolution=". Patch from Rod Schultz, Bull.
-- Fix bug that clears job pending reason field. Patch fron Don Lipari, LLNL.
-- In etc/init.d/slurm move check for scontrol after sourcing
/etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago.
-- Fix in scheduling logic that can delay jobs with min/max node counts.
-- BGQ - fix issue where if a step uses the entire allocation and then
the next step in the allocation only uses part of the allocation it gets
the correct cnodes.
-- BGQ - Fix checking for IO on a block with new IBM driver V1R1M1 previous
function didn't always work correctly.
-- BGQ - Fix issue when a nodeboard goes down and you want to combine blocks
to make a larger small block and are running with sub-blocks.
-- BLUEGENE - Better logic for making small blocks around bad nodeboard/card.
-- BGQ - When using an old IBM driver cnodes that go into error because of
a job kill timeout aren't always reported to the system. This is now
handled by the runjob_mux plugin.
-- BGQ - Added information on how to setup the runjob_mux to run as SlurmUser.
-- Improve memory consumption on step layouts with high task count.
-- BGQ - quiter debug when the real time server comes back but there are
still messages we find when we poll but haven't given it back to the real
time yet.
-- BGQ - fix for if a request comes in smaller than the smallest block and
we must use a small block instead of a shared midplane block.
-- Fix issues on large jobs (>64k tasks) to have the correct counter type when
packing the step layout structure.
-- BGQ - fix issue where if a user was asking for tasks and ntasks-per-node
but not node count the node count is correctly figured out.
-- Move logic to always use the 1st alphanumeric node as the batch host for
batch jobs.
-- BLUEGENE - fix race condition where if a nodeboard/card goes down at the
same time a block is destroyed and that block just happens to be the
smallest overlapping block over the bad hardware.
-- Fix bug when querying accounting looking for a job node size.
-- BLUEGENE - fix possible race condition if cleaning up a block and the
removal of the job on the block failed.
-- BLUEGENE - fix issue if a cable was in an error state make it so we can
check if a block is still makable if the cable wasn't in error.
-- Put nodes names in alphabetic order in node table.
-- If preempted job should have a grace time and preempt mode is not cancel
but job is going to be canceled because it is interactive or other reason
it now receives the grace time.
-- BGQ - Modified documents to explain new plugin_flags needed in bg.properties
in order for the runjob_mux to run correctly.
-- BGQ - change linking from libslurm.o to libslurmhelper.la to avoid warning.
* Changes in SLURM 2.4.0.rc1
=============================
-- Improve task binding logic by making fuller use of HWLOC library,
especially with respect to Opteron 6000 series processors. Work contributed
by Komoto Masahiro.
-- Add new configuration parameter PriorityFlags, based upon work by
Carles Fenoy (Barcelona Supercomputer Center).
-- Modify the step completion RPC between slurmd and slurmstepd in order to
eliminate a possible deadlock. Based on work by Matthieu Hautreux, CEA.
-- Change the owner of slurmctld and slurmdbd log files to the appropriate
user. Without this change the files will be created by and owned by the
user starting the daemons (likely user root).
-- Reorganize the slurmstepd logic in order to better support NFS and
Kerberos credentials via the AUKS plugin. Work by Matthieu Hautreux, CEA.
-- Fix bug in allocating GRES that are associated with specific CPUs. In some
cases the code allocated first available GRES to job instead of allocating
GRES accessible to the specific CPUs allocated to the job.
-- spank: Add callbacks in slurmd: slurm_spank_slurmd_{init,exit}
and job epilog/prolog: slurm_spank_job_{prolog,epilog}
-- spank: Add spank_option_getopt() function to api
-- Change resolution of switch wait time from minutes to seconds.
-- Added CrpCPUMins to the output of sshare -l for those using hard limit
accounting. Work contributed by Mark Nelson.
-- Added mpi/pmi2 plugin for complete support of pmi2 including acquiring
additional resources for newly launched tasks. Contributed by Hongjia Cao,
NUDT.
-- BGQ - fixed issue where if a user asked for a specific node count and more
tasks than possible without overcommit the request would be allowed on more
nodes than requested.
-- Add support for new SchedulerParameters of bf_max_job_user, maximum number
of jobs to attempt backfilling per user. Work by BjæËrn-Helge Mevik,
University of Oslo.
-- BLUEGENE - fixed issue where MaxNodes limit on a partition only limited
larger than midplane jobs.
-- Added cpu_run_min to the output of sshare --long. Work contributed by
Mark Nelson.
-- BGQ - allow regular users to resolve Rack-Midplane to AXYZ coords.
-- Add sinfo output format option of "%R" for partition name without "*"
appended for default partition.
-- Cray - Add support for zero compute note resource allocation to run batch
script on front-end node with no ALPS reservation. Useful for pre- or post-
processing.
-- Support for cyclic distribution of cpus in task/cgroup plugin from Martin
Perry, Bull.
-- GrpMEM limit for QOSes and associations added Patch from BjæËrn-Helge Mevik,
University of Oslo.
-- Various performance improvements for up to 500% higher throughput depending
upon configuration. Work supported by the Oak Ridge National Laboratory
Extreme Scale Systems Center.
-- Added jobacct_gather/cgroup plugin. It is not advised to use this in
production as it isn't currently complete and doesn't provide an equivalent
substitution for jobacct_gather/linux yet. Work by Martin Perry, Bull.
Disable automatic detection of SGE and SLURM to avoid PLIST divergence.
Changes in OpenMPI 1.6
v1.6 is a major release; it effectively deprecates both the v1.4.x
and v1.5.x series, and replaces v1.4.x as the current "super stable" series.
A forthcoming v1.7 series will become the new "feature" series.
*All users* -- including production users and downstream providers --
are encouraged to upgrade to v1.6.
Note that v1.6 is ABI compatible with the entire v1.5.x series,
but is not ABI compatible with the v1.4.x series.
See http://www.open-mpi.org/software/ompi/versions/ for a description
of Open MPI's release methodology.
Here is a list of changes in v1.6 as compared to v1.5.5:
- Fix some process affinity issues. When binding a process, Open MPI
will now bind to all available hyperthreads in a core (or socket,
depending on the binding options specified).
--> Note that "mpirun --bind-to-socket ..." does not work on POWER6-
and POWER7-based systems with some Linux kernel versions. See
the FAQ on the Open MPI web site for more information.
- Add support for ARM5 and ARM6 (in addition to the existing ARM7
support). Thanks to Evan Clinton for the patch.
- Minor Mellanox MXM fixes.
- Properly detect FDR10, FDR, and EDR OpenFabrics devices.
- Minor fixes to the mpirun(1) and MPI_Comm_create(3) man pages.
- Prevent segv if COMM_SPAWN_MULTIPLE fails. Thanks to Fujitsu for
the patch.
- Disable interposed memory management in fakeroot environments. This
fixes a problem in some build environments.
- Minor hwloc updates.
- Array versions of MPI_TEST and MPI_WAIT with a count==0 will now
return immediately with MPI_SUCCESS. Thanks to Jeremiah Willcock
for the suggestion.
- Update VampirTrace to v5.12.2.
- Properly handle forwarding stdin to all processes when "mpirun
--stdin all" is used.
- Workaround XLC assembly bug.
- OS X Tiger (10.3) has not been supported for a while, so forcibly
abort configure if we detect it.
- Fix segv in the openib BTL when running on SPARC 64 systems.
- Fix some include file ordering issues on some BSD-based platforms.
Thanks to Paul Hargove for this (and many, many other) fixes.
- Properly handle .FALSE. return parameter value to attribute copy
callback functions.
- Fix a bunch of minor C++ API issues; thanks to Fujitsu for the patch.
- Fixed the default hostfile MCA parameter behavior.
- Per the MPI spec, ensure not to touch the port_name parameter to
MPI_CLOSE_PORT (it's an IN parameter).
instead of "tcsh". This builds ok for me on NetBSD; if it turns out to
cause trouble for anyone, revert it.
The motivation was that the BUILD_DEPENDS accepted either tcsh or
standalone-tcsh, and distbb was latching onto the latter and then
failing trying to install it. If the package turns out to really need
tcsh in some contexts, there are probably other ways to deal with this
issue.
Changes in 1.5.5
----------------
- Many, many portability configure/build fixes courtesy of Paul
Hargrove. Thanks, Paul!
- Fixed shared memory fault tolerance support compiler errors.
- Removed not-production-quality rshd and tmd PLM launchers.
- Minor updates to the Open MPI SRPM spec file.
- Fixed mpirun's --bind-to-socket option.
- A few MPI_THREAD_MULTIPLE fixes in the shared memory BTL.
- Upgrade the GNU Autotools used to bootstrap the 1.5/1.6 series to
all the latest versions at the time of this release.
- Categorically state in the README that if you're having a problem
with Open MPI with the Linux Intel 12.1 compilers, *upgrade your
Intel Compiler Suite to the latest patch version*, and the problems
will go away. :-)
- Fix the --without-memory-manager configure option.
- Fixes for Totalview/DDT MPI-capable debuggers.
- Update rsh/ssh support to properly handle the Mac OS X library path
(i.e., DYLD_LIBRARY_PATH).
- Make warning about shared memory backing files on a networked file
system be optional (i.e., can be disabled via MCA parameter).
- Several fixes to processor and memory affinity.
- Various shared memory infrastructure improvements.
- Various checkpoint/restart fixes.
- Fix MPI_IN_PLACE (and other MPI sentinel values) on OS X. Thanks to
Dave Goodell for providing the magic OS X gcc linker flags necessary.
- Various man page corrections and typo fixes. Thanks to Fujitsu for
the patch.
- Updated wrapper compiler man pages to list the various --showme
options that are available.
- Add PMI direct-launch support (e.g., "srun mpi_application" under
SLURM).
- Correctly compute the aligned address when packing the
datatype description. Thanks to Fujitsu for the patch.
- Fix MPI obscure corner case handling in packing MPI datatypes.
Thanks to Fujitsu for providing the patch.
- Workaround an Intel compiler v12.1.0 2011.6.233 vector optimization
bug.
- Output the MPI API in ompi_info output.
- Major VT update to 5.12.1.4.
- Rankfile 'P'hysical mapping is no longer available.
- Upgrade embedded Hardware Locality (hwloc) v1.3.2, plus some
post-1.3.2-release bug fixes. All processor and memory binding is
now done through hwloc. Woo hoo! Note that this fixes core binding
on AMD Opteron 6200 and 4200 series-based systems (sometimes known
as Interlagos, Valencia, or other Bulldozer-based chips).
- New MCA parameters to control process-wide memory binding policy:
hwloc_base_mem_alloc_policy, hwloc_base_mem_bind_failure_action (see
ompi_info --param hwloc base).
- Removed direct support for libnuma. Libnuma support may now be
picked up through hwloc.
- Added MPI_IN_PLACE support to MPI_EXSCAN.
- Various fixes for building on Windows, including MinGW support.
- Removed support for the OpenFabrics IBCM connection manager.
- Updated Chelsio T4 and Intel NE OpenFabrics default buffer settings.
- Increased the default RDMA CM timeout to 30 seconds.
- Issue a warning if both btl_tcp_if_include and btl_tcp_if_exclude
are specified.
- Many fixes to the Mellanox MXM transport.
Heavily based on Sun Grid Engine package (parallel/sge).
Open Grid Scheduler/Grid Engine is a free and open-source
batch-queuing system for distributed resource management.
OGS/GE is based on Sun Grid Engine, and maintained by the same
group of external (i.e. non-Sun) developers who started
contributing code to Sun Grid Engine since 2001.
The Portable Hardware Locality (hwloc) software package provides
a portable abstraction (across OS, versions, architectures, ...)
of the hierarchical topology of modern architectures, including
NUMA memory nodes, sockets, shared caches, cores and
simultaneous multithreading. It also gathers various system
attributes such as cache and memory information as well as the
locality of I/O devices such as network interfaces, InfiniBand
HCAs or GPUs. It primarily aims at helping applications with
gathering information about modern computing hardware so as to
exploit it accordingly and efficiently.
The Son of Grid Engine is a community project to continue Sun Grid Engine.
Notable changes in Son of Grid Engine
-------------------------------------
Version 8.0.0d
--------------
* Bug fixes
* Man and fixes
* Fix building with older gcc versions
* Provide load average in qstat XML output [#446, #454]
* Partially back out Univa change which broke classic spooling
* Fix -terse in sge_request [#777]
* Other changes (possibly-incompatible)
* Message fixes
Version 8.0.0c
--------------
* Bug fixes
* Man and other documentation fixes
* Build/installation fixes (particularly for Red Hat 6 and Linux 3)
* Fix group ids for submitted jobs [U]
* Fix default value of boolean with JSV [U]
* Windows fixes for helper crashes and Vista GUI jobs [U]
* Ensure parallel jobs are dispatched to the least loaded host [U]
* Correct ownership of qsub -pty output file; was owned by admin user [U]
* Fix format of Windows loadcheck.exe output [U]
* Read from stderr even if stdout is already closed in IJS [U]
* Fix PDC_INTERVAL=NEVER execd parameter [U]
* Fix accounting information for Windows GUI jobs [U]
* Increase default MAX_DYN_EC qmaster param [U]
* Fix qsub -sync y error message and enforce MAX_DYN_EC correctly [U]
* Fix job validation (-w e) behaviour [#716] [U]
* Fix qrsh input redirection [U]
* Avoid warning when submitting a qrsh job [U]
* Print start time in qstat -j -xml output [U]
* Don't raise an error changing resource request on waiting job [#806]
* Don't exit 0 on error with qconf -secl or -sep
* Include string.h in drmaa.h [#712]
* Fix process-scheduler-log with host aliases
* Enhancements
* Base qmake and qtcsh on the current gmake and tcsh source [#289,
#504, #832]
* Support "-binding linear" and "-binding linear:slots"
* Use the hwloc library for all topology information and core
binding, supporting more operating systems (now: AIX, Darwin,
FreeBSD, GNU/Linux, HPUX, MS Windows, OSF/1, Solaris), and more
hardware types (specifically AMD Magny Cours and similar)
* Add task number to execd "exceeds job ... limit"
* Other changes (possibly-incompatible)
* Modify default paths in build files and elsewhere [U]
* Assorted message fixes
* In RPMs, move qsched to qmaster package, and separate drmaa4ruby
* Default to newijs in load_sge_config.sh
* Default to sh, not csh for configured shell
Version 8.0.0b
--------------
* Bug fixes
* Build/installation fixes [including #424, #1349] [(U)]
* Fix execd init script [#1348]
* Man and other documentation fixes [including #614, #764] [(U)]
* Fix contents of admin mail properly [#1307, #1345]
* Fix qalter messages for -tc
* Fix build with -DSGE_PQS_API
* Fix group ids for submitted jobs [U]
* Enhancements
* Update qsched and add man page
* Other changes (possibly-incompatible)
* Avoid the use of /bin/ksh [#1306]
* Change installation defaults to classic spooling, not adding
shadow hosts, and not JMX. [(U)]
Version 8.0.0a
--------------
This is roughly a superset of Univa's 8.0.0 (the V800_TAG from
https://github.com/gridengine/gridengine), with thanks for that.
Changes made there which haven't been included in this version: PLPA
source not removed; some different build/installation defaults
(e.g. for JMX); Univa/UGE "branding" (partly because trademark status
is unknown); authuser not removed (for SDM and testing use),
* Bug fixes
* Many man and other documentation fixes [including #790, #776,
#769, #733, #610, #587, #581, #459, #456, #439, #255, #1288, #797,
#1271, #773] [(U)]
* Some program message fixes [(U)]
* Various build and installation fixes [including #761, #709, #656,
#616, #546, #536, #521, #491, #438, #414, #411, #383, #381, #138,
#455, #344, #438, #1311, #1272, #1273] [(U)]
* Ask for keystore password twice on installation
* Fix qmaster crashes with tightly integrated parallel jobs or
un-discoverable qinstance [#789] [U]
* Report 0 cores and sockets on unsupported Solaris hosts [U]
* Fix malloc hooks which caused crashes, particularly with SuSE 11
[#792, #748, #749] [U]
* Verify the pe task start user in execd in non-CSP mode [U]
* Fix binding parameters parsing [U]
* Fix JSV logging with multiple users submitting jobs on same submit
host [U]
* Fix unresponsive qmaster when modifying the global configuration
in a huge cluster [U]
* Speed up finishing tightly integrated jobs [U]
* Check consistency of JSV binding information properly [U]
* Fix broken project spooling, which caused loss of project when
restarting master when using core binding [U]
* Fix slotwise preemption failure to unsuspend one job per host [#775] [U]
* Fix problems retrieving passwd and group information with large
responses [#1295] [(U)]
* Fix JSV changing default of boolean [U]
* Fix ENABLE_RESCHEDULE_SLAVE=1 [U]
* Allow comma in CMDNAME with Perl JSV scripts [#803]
* Don't put queue into error state when supplementary group id
cannot be set [#185] [U]
* Don't convert LF to CRLF with qrsh -pty [U]
* Fix qconf segfault on bad subordination string [U]
* Fix group ids of submitted jobs [U]
* Disallow -masterq with serial jobs [#155] [U]
* Fix 100% CPU use by shepherd of qsh [U]
* Removed unnecessary binding warning on job starts [U]
* Fix qconf error reports when tmp directory has 755 permissions [U]
* Fix suspending of remote process on qrsh -pty yes <cmd> on Solaris [U]
* Fix starting jobs after global host changed [U]
* Reject invalid load_formula value [U]
* Fix handling of implicitly-requested exclusive resources [U]
* Fix execd vmem reporting on 64-bit Linux [U]
* Fix startup of execd on Windows Vista [U]
* Set xterm's path more appropriately on GNU/Linux [#557]
* Fix generation of admin email from failed jobs [#1307]
* Fix some ill-formed output from qstat -xml [#314]
* Fix handling of multi-line environment variables propagated to
shepherd [#395]
* Fix example MPI PE templates
* Fix bad quoting in JSV sh library
* Fix checking of consumables for parallel jobs across multiple hosts [U]
* Enhancements
* Additional and clarified documentation
* PAM modules for ssh tight integration and access control for
interactive jobs
* Initial core binding support for Solaris/SPARC64 [U]
* Some efficiency improvements and memory leaks fixed [U]
* Ports to S/390 and PARISC GNU/Linux [U]
* New complex m_thread [U]
* Show topology by default in qhost [U]
* qsub -pty switch [#704] [U]
* Improved qmon graphics [#530] [(U)]
* Include bash in default shell list [U]
* A JSV that rejects all jobs [U]
* Files for Scali-MPI
* Ruby DRMAA implementation
* Enable easy building against shared system libraries and use
system openssl and bdb binaries
* New scripts: "qsched" reports resource reservations; "status"
wraps qstat; enable/disable submission; node-selection (idle etc.)
* Restart argument for daemon init scripts
* Improved efficiency of shell JSV if used with bash
* Core dumps from crashing daemons enabled under Linux [U]
* Example host_aliases file [#154]
* Spec file for RPM packaging [#820]
* Other changes (possibly-incompatible)
* Show core binding by default in qstat, qhost (use -ncb for
compatibility) [U]
* Removed Berkeley DB RPC support (recently dropped by BDB) [U]
* Changed position in pending job list for user-rescheduled jobs
(exit99, qmod -rj) and OLD_RESCHEDULE_BEHAVIOR,
OLD_RESCHEDULE_BEHAVIOR_ARRAY_JOB parameters [U]
* Unified GNU/Linux arch strings (lx-*, from lx24-* and lx26-*) [U]
* Default to enabling core binding on GNU/Linux [U]
* Removed Sun service tags support [U]
* Removed obsolete SunHPCT5 files
SLURM is an open-source resource manager designed for Linux
clusters of all sizes. It provides three key functions. First it
allocates exclusive and/or non-exclusive access to resources
(computer nodes) to users for some duration of time so they can
perform work. Second, it provides a framework for starting,
executing, and monitoring work (typically a parallel job) on a
set of allocated nodes. Finally, it arbitrates contention for
resources by managing a queue of pending work.
===============================================================================
Changes in 1.4.1
===============================================================================
# OVERALL: Several improvements to the ARMCI API implementation
within MPICH2.
# Build system: Added beta support for DESTDIR while installing
MPICH2.
# PM/PMI: Upgrade hwloc to 1.2.1rc2.
# PM/PMI: Initial support for the PBS launcher.
# Several other minor bug fixes, memory leak fixes, and code cleanup.
A full list of changes is available using:
svn log -r8675:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.4.1
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.4.1?action=follow_copy&rev=HEAD&stop_rev=8675&mode=follow_copy
===============================================================================
Changes in 1.4
===============================================================================
# OVERALL: Improvements to fault tolerance for collective
operations. Thanks to Rui Wang @ ICT for reporting several of these
issues.
# OVERALL: Improvements to the universe size detection. Thanks to
Yauheni Zelenko for reporting this issue.
# OVERALL: Bug fixes for Fortran attributes on some systems. Thanks
to Nicolai Stange for reporting this issue.
# OVERALL: Added new ARMCI API implementation (experimental).
# OVERALL: Added new MPIX_Group_comm_create function to allow
non-collective creation of sub-communicators.
# FORTRAN: Bug fixes in the MPI_DIST_GRAPH_ Fortran bindings.
# PM/PMI: Support for a manual "none" launcher in Hydra to allow for
higher-level tools to be built on top of Hydra. Thanks to Justin
Wozniak for reporting this issue, for providing several patches for
the fix, and testing it.
# PM/PMI: Bug fixes in Hydra to handle non-uniform layouts of hosts
better. Thanks to the MVAPICH group at OSU for reporting this issue
and testing it.
# PM/PMI: Bug fixes in Hydra to handle cases where only a subset of
the available launchers or resource managers are compiled
in. Thanks to Satish Balay @ Argonne for reporting this issue.
# PM/PMI: Support for a different username to be provided for each
host; this only works for launchers that support this (such as
SSH).
# PM/PMI: Bug fixes for using Hydra on AIX machines. Thanks to
Kitrick Sheets @ NCSA for reporting this issue and providing the
first draft of the patch.
# PM/PMI: Bug fixes in memory allocation/management for environment
variables that was showing up on older platforms. Thanks to Steven
Sutphen for reporting the issue and providing detailed analysis to
track down the bug.
# PM/PMI: Added support for providing a configuration file to pick
the default options for Hydra. Thanks to Saurabh T. for reporting
the issues with the current implementation and working with us to
improve this option.
# PM/PMI: Improvements to the error code returned by Hydra.
# PM/PMI: Bug fixes for handling "=" in environment variable values in
hydra.
# PM/PMI: Upgrade the hwloc version to 1.2.
# COLLECTIVES: Performance and memory usage improvements for MPI_Bcast
in certain cases.
# VALGRIND: Fix incorrect Valgrind client request usage when MPICH2 is
built for memory debugging.
# BUILD SYSTEM: "--enable-fast" and "--disable-error-checking" are once
again valid simultaneous options to configure.
# TEST SUITE: Several new tests for MPI RMA operations.
# Several other minor bug fixes, memory leak fixes, and code cleanup.
A full list of changes is available using:
svn log -r7838:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.4
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.4?action=follow_copy&rev=HEAD&stop_rev=7838&mode=follow_copy
===============================================================================
Changes in 1.3.2
===============================================================================
# OVERALL: MPICH2 now recognizes the OSX mach_absolute_time as a
native timer type.
# OVERALL: Performance improvements to MPI_Comm_split on large
systems.
# OVERALL: Several improvements to error returns capabilities in the
presence of faults.
# PM/PMI: Several fixes and improvements to Hydra's process binding
capability.
# PM/PMI: Upgrade the hwloc version to 1.1.1.
# PM/PMI: Allow users to sort node lists allocated by resource
managers in Hydra.
# PM/PMI: Improvements to signal handling. Now Hydra respects Ctrl-Z
signals and passes on the signal to the application.
# PM/PMI: Improvements to STDOUT/STDERR handling including improved
support for rank prepending on output. Improvements to STDIN
handling for applications being run in the background.
# PM/PMI: Split the bootstrap servers into "launchers" and "resource
managers", allowing the user to pick a different resource manager
from the launcher. For example, the user can now pick the "SLURM"
resource manager and "SSH" as the launcher.
# PM/PMI: The MPD process manager is deprecated.
# PM/PMI: The PLPA process binding library support is deprecated.
# WINDOWS: Adding support for gfortran and 64-bit gcc libs.
# Several other minor bug fixes, memory leak fixes, and code cleanup.
A full list of changes is available using:
svn log -r7457:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.3.2
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.3.2?action=follow_copy&rev=HEAD&stop_rev=7457&mode=follow_copy
Intel(r) Threading Building Blocks (Intel TBB) offers a rich and
complete approach to expressing parallelism in a C++ program.
It is a library that helps you take advantage of multi-core
processor performance without having to be a threading expert.
Intel TBB is not just a threads-replacement library. It represents
a higher-level, task-based parallelism that abstracts platform
details and threading mechanisms for scalability and performance.
Changes in v1.5.4 as compared to v1.5.3:
- Add support for the (as yet unreleased) Mellanox MXM transport.
- Add support for dynamic service levels (SLs) in the openib BTL.
- Fixed C++ bindings cosmetic/warnings issue with
MPI::Comm::NULL_COPY_FN and MPI::Comm::NULL_DELETE_FN. Thanks to
Julio Hoffimann for identifying the issues.
- Also allow the word "slots" in rankfiles (i.e., not just "slot").
(** also to appear in 1.4.4)
- Add Mellanox ConnectX 3 device IDs to the openib BTL defaults.
(** also to appear in 1.4.4)
- Various FCA updates.
- Fix 32 bit SIGBUS errors on Solaris SPARC platforms.
- Add missing ARM assembly code files.
- Update to allow more than 128 entries in an appfile.
(** also to appear in 1.4.4)
- Various VT updates and bug fixes.
- Update description of btl_openib_cq_size to be more accurate.
(** also to appear in 1.4.4)
- Various assembly "clobber" fixes.
- Fix a hang in carto selection in obscure situations.
- Guard the inclusion of execinfo.h since not all platforms have it. Thanks
to Aleksej Saushev for identifying this issue.
(** also to appear in 1.4.4)
- Support Solaris legacy munmap prototype changes.
(** also to appear in 1.4.4)
- Updated to Automake 1.11.1 per
http://www.open-mpi.org/community/lists/devel/2011/07/9492.php.
- Fix compilation of LSF support.
- Update MPI_Comm_spawn_multiple.3 man page to reflect what it
actually does.
- Fix for possible corruption of the environment. Thanks to Peter
Thompson for the suggestion. (** also to appear in 1.4.4)
- Enable use of PSM on direct-launch SLURM jobs.
- Update paffinity hwloc to v1.2, and to fix minor bugs affinity
assignment bugs on PPC64/Linux platforms.
- Let the openib BTL auto-detect its bandwidth.
- Support new MPI-2.2 datatypes.
- Updates to support more datatypes in MPI one-sided communication.
- Fix recursive locking bug when MPI-IO was used with
MPI_THREAD_MULTIPLE. (** also to appear in 1.4.4)
- Fix mpirun handling of prefix conflicts.
- Ensure mpirun's --xterm options leaves sessions attached.
(** also to appear in 1.4.4)
- Fixed type of sendcounts and displs in the "use mpi" F90 module.
ABI is preserved, but applications may well be broken. See the
README for more details. Thanks to Stanislav Sazykin for
identifying the issue. (** also to appear in 1.4.4)
- Fix indexed datatype leaks. Thanks to Pascal Deveze for supplying
the initial patch. (** also to appear in 1.4.4)
- Fix debugger mapping when mpirun's -npernode option is used.
- Fixed support for configure's --disable-dlopen option when used with
"make distclean".
- Fix segv associated with MPI_Comm_create with MPI_GROUP_EMPTY.
Thanks to Dominik Goeddeke for finding this.
(** also to appear in 1.4.4)
- Improved LoadLeveler ORTE support.
- Add new WindVerbs BTL plugin, supporting native OpenFabrics verbs on
Windows (the "wv" BTL).
- Add new btl_openib_gid_index MCA parameter to allow selecting which
GID to use on an OpenFabrics device's GID table.
- Add support for PCI relaxed ordering in the OpenFabrics BTL (when
available).
- Update rsh logic to allow correct SGE operation.
- Ensure that the mca_paffinity_alone MCA parameter only appears once
in the ompi_info output. Thanks to Gus Correa for identifying the
issue.
- Fixed return codes from MPI_PROBE and MPI_IPROBE.
(** also to appear in 1.4.4)
- Remove --enable-progress-thread configure option; it doesn't work on
the v1.5 branch. Rename --enable-mpi-threads to
--enable-mpi-thread-multiple. Add new --enable-opal-multi-threads
option.
- Updates for Intel Fortran compiler version 12.
- Remove bproc support. Farewell bproc!
- If something goes wrong during MPI_INIT, fix the error
message to say that it's illegal to invoke MPI_INIT before
MPI_INIT.
more machines. A job is typically a single command or a small script that
has to be run for each of the lines in the input. The typical input is a
list of files, a list of hosts, a list of users, or a list of tables.
1.5.3
-----
- Add missing "affinity" MPI extension (i.e., the OMPI_Affinity_str()
API) that was accidentally left out of the 1.5.2 release.
1.5.2
-----
- Replaced all custom topology / affinity code with initial support
for hwloc v1.1.1 (PLPA has been removed -- long live hwloc!). Note
that hwloc is bundled with Open MPI, but an external hwloc can be
used, if desired. See README for more details.
- Many CMake updates for Windows builds.
- Updated opal_cr_thread_sleep_wait MCA param default value to make it
less aggressive.
- Updated debugger support to allow Totalview attaching from jobs
launched directly via srun (not mpirun). Thanks to Nikolay Piskun
for the patch.
- Added more FTB/CIFTS support.
- Fixed compile error with the PGI compiler.
- Portability fixes to allow the openib BTL to run on the Solaris
verbs stack.
- Fixed multi-token command-line issues when using the mpirun
--debug switch. For example:
mpirun --debug -np 2 a.out "foo bar"
Thanks to Gabriele Fatigati for reporting the issue.
- Added ARM support.
- Added the MPI_ROOT environment variable in the Open MPI Linux SRPM
for customers who use the BPS and LSF batch managers.
- Updated ROMIO from MPICH v1.3.1 (plus one additional patch).
- Fixed some deprecated MPI API function notification messages.
- Added new "bfo" PML that provides failover on OpenFabrics networks.
- Fixed some buffer memcheck issues in MPI_*_init.
- Added Solaris-specific chip detection and performance improvements.
- Fix some compile errors on Solaris.
- Updated the "rmcast" framework with bug fixes, new functionality.
- Updated the Voltaire FCA component with bug fixes, new
functionality. Support for FCA version 2.1.
- Fix gcc 4.4.x and 4.5.x over-aggressive warning notifications on
possibly freeing stack variables. Thanks to the Gentoo packagers
for reporting the issue.
- Make the openib component be verbose when it disqualifies itself due
to MPI_THREAD_MULTIPLE.
- Minor man page fixes.
- Various checkpoint / restart fixes.
- Fix race condition in the one-sided unlock code. Thanks to
Guillaume Thouvenin for finding the issue.
- Improve help message aggregation.
- Add OMPI_Affinity_str() optional user-level API function (i.e., the
"affinity" MPI extension). See README for more details.
- Added btl_tcp_if_seq MCA parameter to select a different ethernet
interface for each MPI process on a node. This parameter is only
useful when used with virtual ethernet interfaces on a single
network card (e.g., when using virtual interfaces give dedicated
hardware resources on the NIC to each process).
- Changed behavior of mpirun to terminate if it receives 10 (or more)
SIGPIPEs.
- Fixed oversubscription detection.
- Added new mtl_mx_board and mtl_mx_endpoint MCA parameters.
- Added ummunotify support for OpenFabrics-based transports. See the
README for more details.
Changes in 1.3.1
# OVERALL: MPICH2 is now fully compliant with the CIFTS FTB standard
MPI events (based on the draft standard).
# OVERALL: Major improvements to RMA performance for long lists of
RMA operations.
# OVERALL: Performance improvements for Group_translate_ranks.
# COLLECTIVES: Collective algorithm selection thresholds can now be controlled
at runtime via environment variables.
# ROMIO: PVFS error codes are now mapped to MPI error codes.
# Several other minor bug fixes, memory leak fixes, and code cleanup.
A full list of changes is available using:
svn log -r7350:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.3.1
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.3.1?action=follow_copy&rev=HEAD&stop_rev=7350&mode=follow_copy
Changes in 1.3
# OVERALL: Initial support for fine-grained threading in
ch3:nemesis:tcp.
# OVERALL: Support for Asynchronous Communication Progress.
# OVERALL: The ssm and shm channels have been removed.
# OVERALL: Checkpoint/restart support using BLCR.
# OVERALL: Improved tolerance to process and communication failures
when error handler is set to MPI_ERRORS_RETURN. If a communication
operation fails (e.g., due to a process failure) MPICH2 will return
an error, and further communication to that process is not
possible. However, communication with other processes will still
proceed normally. Note, however, that the behavior collective
operations on communicators containing the failed process is
undefined, and may give incorrect results or hang some processes.
# OVERALL: Experimental support for inter-library dependencies.
# PM/PMI: Hydra is now the default process management framework
replacing MPD.
# PM/PMI: Added dynamic process support for Hydra.
# PM/PMI: Added support for LSF, SGE and POE in Hydra.
# PM/PMI: Added support for CPU and memory/cache topology aware
process-core binding.
# DEBUGGER: Improved support and bug fixes in the Totalview support.
# Build system: Replaced F90/F90FLAGS by FC/FCFLAGS. F90/F90FLAGS are
not longer supported in the configure.
# Multi-compiler support: On systems where C compiler that is used to
build mpich2 libraries supports multiple weak symbols and multiple aliases,
the Fortran binding built in the mpich2 libraries can handle different
Fortran compilers (than the one used to build mpich2). Details in README.
# Several other minor bug fixes, memory leak fixes, and code cleanup.
A full list of changes is available using:
svn log -r5762:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.3
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.3?action=follow_copy&rev=HEAD&stop_rev=5762&mode=follow_copy
Changes in 1.5.1
- Fixes for the Oracle Studio 12.2 Fortran compiler.
- Fix SPARC and SPARCv9 atomics. Thanks to Nicola Stange for the
initial patch.
- Fix Libtool issues with the IBM XL compiler in 64-bit mode.
- Restore the reset of the libevent progress counter to avoid
over-sampling the event library.
- Update memory barrier support.
- Use memmove (instead of memcpy) when necessary (e.g., source and
destination overlap).
- Fixed ompi-top crash.
- Fix to handle Autoconf --program-transforms properly and other
m4/configury updates. Thanks to the GASNet project for the
--program transforms fix.
- Allow hostfiles to specify usernames on a per-host basis.
- Update wrapper compiler scripts to search for perl during configure,
per request from the BSD maintainers.
- Minor man page fixes.
- Added --with-libltdl option to allow building Open MPI with an
external installation of libltdl.
- Fixed various issues with -D_FORTIFY_SOURCE=2.
- Various VT fixes and updates.
Notable changes in 1.5
- Added "knem" support: direct process-to-process copying for shared
memory message passing. See http://runtime.bordeaux.inria.fr/knem/
and the README file for more details.
- Updated shared library versioning scheme and linking style of MPI
applications. The MPI application ABI has been broken from the
v1.3/v1.4 series. MPI applications compiled against any prior
version of Open MPI will need to, at a minimum, re-link. See the
README file for more details.
- Added "fca" collective component, enabling MPI collective offload
support for Voltaire switches.
- Fixed MPI one-sided operations with large target displacements.
Thanks to Brian Price and Jed Brown for reporting the issue.
- Fixed MPI_GET_COUNT when used with large counts. Thanks to Jed
Brown for reporting the issue.
- Made the openib BTL safer if extremely low SRQ settings are used.
- Fixed handling of the array_of_argv parameter in the Fortran
binding of MPI_COMM_SPAWN_MULTIPLE (** also to appear: 1.4.3).
- Fixed malloc(0) warnings in some collectives.
- Fixed a problem with the Fortran binding for
MPI_FILE_CREATE_ERRHANDLER. Thanks to Secretan Yves for identifying
the issue (** also to appear: 1.4.3).
- Updates to the LSF PLM to ensure that the path is correctly passed.
Thanks to Teng Lin for the patch (** also to appear: 1.4.3).
- Fixes for the F90 MPI_COMM_SET_ERRHANDLER and MPI_WIN_SET_ERRHANDLER
bindings. Thanks to Paul Kapinos for pointing out the issue
(** also to appear: 1.4.3).
- Fixed extra_state parameter types in F90 prototypes for
MPI_COMM_CREATE_KEYVAL, MPI_GREQUEST_START, MPI_REGISTER_DATAREP,
MPI_TYPE_CREATE_KEYVAL, and MPI_WIN_CREATE_KEYVAL.
- Fixes for Solaris oversubscription detection.
- If the PML determines it can't reach a peer process, print a
slightly more helpful message. Thanks to Nick Edmonds for the
suggestion.
- Make btl_openib_if_include/exclude function the same way
btl_tcp_if_include/exclude works (i.e., supplying an _include list
overrides supplying an _exclude list).
- Apply more scalable reachability algorithm on platforms with more
than 8 TCP interfaces.
- Various assembly code updates for more modern platforms / compilers.
- Relax restrictions on using certain kinds of MPI datatypes with
one-sided operations. Users beware; not all MPI datatypes are valid
for use with one-sided operations!
- Improve behavior of MPI_COMM_SPAWN with regards to --bynode.
- Various threading fixes in the openib BTL and other core pieces of
Open MPI.
- Various help file and man pages updates.
- Various FreeBSD and NetBSD updates and fixes. Thanks to Kevin
Buckley and Aleksej Saushev for their work.
- Fix case where freeing communicators in MPI_FINALIZE could cause
process failures.
- Print warnings if shared memory state files are opened on what look
like networked filesystems.
- Update libevent to v1.4.13.
- Allow propagating signals to processes that call fork().
- Fix bug where MPI_GATHER was sometimes incorrectly examining the
datatype on non-root processes. Thanks to Michael Hofmann for
investigating the issue.
- Various Microsoft Windows fixes.
- Various Catamount fixes.
- Various checkpoint / restart fixes.
- Xgrid support has been removed until it can be fixed (patches
would be welcome).
- Added simplistic "libompitrace" contrib package. Using the MPI
profiling interface, it essentially prints out to stderr when select
MPI functions are invoked.
- Update bundled VampirTrace to v5.8.2.
- Add pkg-config(1) configuration files for ompi, ompi-c, ompi-cxx,
ompi-f77, ompi-f90. See the README for more details.
- Removed the libopenmpi_malloc library (added in the v1.3 series)
since it is no longer necessary
- Add several notifier plugins (generally used when Open MPI detects
system/network administrator-worthy problems); each have their own
MCA parameters to govern their usage. See "ompi_info --param
notifier <name>" for more details.
- command to execute arbitrary commands (e.g., run a script).
- file to send output to a file.
- ftb to send output to the Fault Tolerant Backplane (see
http://wiki.mcs.anl.gov/cifts/index.php/CIFTS)
- hnp to send the output to mpirun.
- smtp (requires libesmtp) to send an email.
New in 1.4.3
------------
- Fixed handling of the array_of_argv parameter in the Fortran
binding of MPI_COMM_SPAWN_MULTIPLE.
- Fixed a problem with the Fortran binding for
MPI_FILE_CREATE_ERRHANDLER. Thanks to Secretan Yves for identifying
the issue.
- Updates to the LSF PLM to ensure that the path is correctly passed.
Thanks to Teng Lin for the patch.
- Fixes for the F90 MPI_COMM_SET_ERRHANDLER and MPI_WIN_SET_ERRHANDLER
bindings. Thanks to Paul Kapinos for pointing out the issue.
- Fixed various MPI_THREAD_MULTIPLE race conditions.
- Fixed an issue with an undeclared variable from ptmalloc2 munmap on
BSD systems.
- Fixes for BSD interface detection.
- Various other BSD fixes. Thanks to Kevin Buckley helping to track.
all of this down.
- Fixed issues with the use of the -nper* mpirun command line arguments.
- Fixed an issue with coll tuned dynamic rules.
- Fixed an issue with the use of OPAL_DESTDIR being applied too aggressively.
- Fixed an issue with one-sided xfers when the displacement exceeds 2GBytes.
- Change to ensure TotalView works properly on Darwin.
- Added support for Visual Studio 2010.
- Fix to ensure proper placement of VampirTrace header files.
- Needed to add volatile keyword to a varialbe used in debugging
(MPIR_being_debugged).
- Fixed a bug in inter-allgather.
- Fixed malloc(0) warnings.
- Corrected a typo the MPI_Comm_size man page (intra -> inter). Thanks
to Simon number.cruncher for pointing this out.
- Fixed a SegV in orted when given more than 127 app_contexts.
- Removed xgrid source code from the 1.4 branch since it is no longer
supported in the 1.4 series.
- Removed the --enable-opal-progress-threads config option since
opal progress thread support does not work in 1.4.x.
- Fixed a defect in VampirTrace's vtfilter.
- Fixed wrong Windows path in hnp_contact.
- Removed the requirement for a paffinity component.
- Removed a hardcoded limit of 64 interconnected jobs.
- Fix to allow singletons to use ompi-server for rendezvous.
- Fixed bug in output-filename option.
- Fix to correctly handle failures in mx_init().
- Fixed a potential Fortran memory leak.
- Fixed an incorrect branch in some ppc32 assembly code. Thanks
to Matthew Clark for this fix.
- Remove use of undocumented AS_VAR_GET macro during configuration.
- Fixed an issue with VampirTrace's wrapper for MPI_init_thread.
- Updated mca-btl-openib-device-params.ini file with various new vendor id's.
- Configuration fixes to ensure CPPFLAGS in handled properly if a non-standard
valgrind location was specified.
- Various man page updates
I managed to trace things to the file libmetrics/netbsd/metrics.c in
the get_netbw function. Apparently, the code in get_netbw violates
alignment constraints for sparc64. I attached a patch against the result
of a "make patch" in parallel/ganglia-monitor-core. While I was at it, I
also changed proc_run_func somewhat to only count actually running
processes (having a look at NetBSD's ps(1) implementation) - without the
change, I got around 30 running processes on an idle machine.
"Looks good at a quick glance" martin@
Bump PKGREVISION.
to trigger/signal a rebuild for the transition 5.10.1 -> 5.12.1.
The list of packages is computed by finding all packages which end
up having either of PERL5_USE_PACKLIST, BUILDLINK_API_DEPENDS.perl,
or PERL5_PACKLIST defined in their make setup (tested via
"make show-vars VARNAMES=..."), minus the packages updated after
the perl package update.
sno@ was right after all, obache@ kindly asked and he@ led the
way. Thanks!
Changes in v1.4.2 as compared to v1.4.1:
- Fixed problem when running in heterogeneous environments.
- Update LSF support to ensure that the path is passed correctly.
- Fixed some miscellaneous oversubscription detection bugs.
- IBM re-licensed its LoadLeveler code to be BSD-compliant.
- Various fixes for multithreading deadlocks, race conditions, and
other nefarious things.
- Fixed ROMIO's handling of "nearly" contiguous issues (e.g., with
non-zero true_lb).
- Bunches of Windows build fixes.
- Now allow the graceful failover from MTLs to BTLs if no MTLs can
initialize successfully.
- Added "clobber" information to various atomic operations, fixing
erroneous behavior in some newer versions of the GNU compiler suite.
- Update various iWARP and InfiniBand device specifications in the
OpenFabrics .ini support file.
- Fix the use of hostfiles when a username is supplied.
- Various fixes for rankfile support.
- Updated the internal version of VampirTrace to 5.4.12.
- Fixed OS X TCP wireup issues having to do with IPv4/IPv6 confusion
(see https://svn.open-mpi.org/trac/ompi/changeset/22788 for more
details).
- Fixed some problems in processor affinity support, including when
there are "holes" in the processor namespace (e.g., offline
processors).
- Ensure that Open MPI's "session directory" (usually located in /tmp)
is cleaned up after process termination.
- Fixed some problems with the collective "hierarch" implementation
that could occur in some obscure conditions.
- Various MPI_REQUEST_NULL, API parameter checking, and attribute
error handling fixes.
- Fix case where MPI_GATHER erroneously used datatypes on non-root nodes.
- Patched ROMIO support for PVFS2 > v2.7 (patch taken from MPICH2
version of ROMIO).
- Fixed "mpirun --report-bindings" behavior when used with
mpi_paffinity_alone=1. Also fixed mpi_paffinity_alone=1 behavior
with non-MPI applications.
- Ensure that all OpenFabrics devices have compatible receive_queues
specifications before allowing them to communicate. See the lengthy
comment in https://svn.open-mpi.org/trac/ompi/changeset/22592 for details.
- Fix some issues with checkpoint/restart.
- Improve the pre-MPI_INIT/post-MPI_FINALIZE error messages.
- Ensure that loopback addresses are never advertised to peer
processes for RDMA/OpenFabrics support.
- Fixed a CSUM PML false positive.
- Various fixes for Catamount support.
- Minor update to wrapper compilers in how user-specific argv is
ordered on the final command line. Thanks to Jed Brown for the
suggestions.
- Update to PLPA v1.3.2, addressing a licensing issue identified by
the Fedora project. See
https://svn.open-mpi.org/trac/plpa/changeset/262 for details.
- Add check for malformed checkpoint metadata files (Ticket #2141).
- Fix error path in ompi-checkpoint when not able to checkpoint
(Ticket #2138).
- Cleanup component release logic when selecting checkpoint/restart
enabled components (Ticket #2135).
- Fixed VT node name detection for Cray XT platforms, and fixed some
broken VT documentation files.
- Fix a possible race condition in tearing down RDMA CM-based
connections.
- Relax error checking on MPI_GRAPH_CREATE. Thanks to David Singleton
for pointing out the issue.
- Fix a shared memory "hang" problem that occurred on x86/x86_64
platforms when used with the GNU >=4.4.x compiler series.
- Add fix for Libtool 2.2.6b's problems with the PGI 10.x compiler
suite. Inspired directly from the upstream Libtool patches that fix
the issue (but we need something working before the next Libtool
release).
===============================================================================
Changes in 1.2.1
===============================================================================
# OVERALL: Improved support for fine-grained multithreading.
# OVERALL: Improved integration with Valgrind for debugging builds of MPICH2.
# PM/PMI: Initial support for hwloc process-core binding library in
Hydra.
# PM/PMI: Updates to the PMI-2 code to match the PMI-2 API and
wire-protocol draft.
# Several other minor bug fixes, memory leak fixes, and code cleanup.
A full list of changes is available using:
svn log -r5425:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.2.1
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.2.1?action=follow_copy&rev=HEAD&stop_rev=5425&mode=follow_copy
===============================================================================
Changes in 1.2
===============================================================================
# OVERALL: Support for MPI-2.2
# OVERALL: Several fixes to Nemesis/MX.
# WINDOWS: Performance improvements to Nemesis/windows.
# PM/PMI: Scalability and performance improvements to Hydra using
PMI-1.1 process-mapping features.
# PM/PMI: Support for process-binding for hyperthreading enabled
systems in Hydra.
# PM/PMI: Initial support for PBS as a resource management kernel in
Hydra.
# PM/PMI: PMI2 client code is now officially included in the release.
# TEST SUITE: Support to run the MPICH2 test suite through valgrind.
# Several other minor bug fixes, memory leak fixes, and code cleanup.
A full list of changes is available using:
svn log -r5025:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.2
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.2?action=follow_copy&rev=HEAD&stop_rev=5025&mode=follow_copy
===============================================================================
Changes in 1.1.1p1
===============================================================================
- OVERALL: Fixed an invalid read in the dataloop code for zero count types.
- OVERALL: Fixed several bugs in ch3:nemesis:mx (tickets #744,#760;
also change r5126).
- BUILD SYSTEM: Several fixes for functionality broken in 1.1.1 release,
including MPICH2LIB_xFLAGS and extra libraries living in $LIBS instead of
$LDFLAGS. Also, '-lpthread' should no longer be duplicated in link lines.
- BUILD SYSTEM: MPICH2 shared libraries are now compatible with glibc versioned
symbols on Linux, such as those present in the MX shared libraries.
- BUILD SYSTEM: Minor tweaks to improve compilation under the nvcc CUDA
compiler.
- PM/PMI: Fix mpd incompatibility with python2.3 introduced in mpich2-1.1.1.
- PM/PMI: Several fixes to hydra, including memory leak fixes and process
binding issues.
- TEST SUITE: Correct invalid arguments in the coll2 and coll3 tests.
- Several other minor bug fixes, memory leak fixes, and code cleanup. A full
list of changes is available using:
svn log -r5032:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.1.1p1
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.1.1p1?action=follow_copy&rev=HEAD&stop_rev=5032&mode=follow_copy
===============================================================================
Changes in 1.1.1
===============================================================================
# OVERALL: Improved support for Boost MPI.
# PM/PMI: Significantly improved time taken by MPI_Init with Nemesis and MPD on
large numbers of processes.
# PM/PMI: Improved support for hybrid MPI-UPC program launching with
Hydra.
# PM/PMI: Improved support for process-core binding with Hydra.
# PM/PMI: Preliminary support for PMI-2. Currently supported only
with Hydra.
# Many other bug fixes, memory leak fixes and code cleanup. A full
list of changes is available using:
svn log -r4655:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.1.1
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.1.1?action=follow_copy&rev=HEAD&stop_rev=4655&mode=follow_copy
===============================================================================
Changes in 1.1
===============================================================================
- OVERALL: Added MPI 2.1 support.
- OVERALL: Nemesis is now the default configuration channel with a
completely new TCP communication module.
- OVERALL: Windows support for nemesis.
- OVERALL: Added a new Myrinet MX network module for nemesis.
- OVERALL: Initial support for shared-memory aware collective
communication operations. Currently MPI_Bcast, MPI_Reduce, MPI_Allreduce,
and MPI_Scan.
- OVERALL: Improved handling of MPI Attributes.
- OVERALL: Support for BlueGene/P through the DCMF library (thanks to
IBM for the patch).
- OVERALL: Experimental support for fine-grained multithreading
- OVERALL: Added dynamic processes support for Nemesis.
- OVERALL: Added automatic as well as statically runtime configurable
receive timeout variation for MPD (thanks to OSU for the patch).
- OVERALL: Improved performance for MPI_Allgatherv, MPI_Gatherv, and MPI_Alltoall.
- PM/PMI: Initial support for the new Hydra process management
framework (current support is for ssh, rsh, fork and a preliminary
version of slurm).
- ROMIO: Added support for MPI_Type_create_resized and
MPI_Type_create_indexed_block datatypes in ROMIO.
- ROMIO: Optimized Lustre ADIO driver (thanks to Weikuan Yu for
initial work and Sun for further improvements).
- Many other bug fixes, memory leak fixes and code cleanup. A full
list of changes is available using:
svn log -r813:HEAD https://svn.mcs.anl.gov/repos/mpi/mpich2/tags/release/mpich2-1.1
... or at the following link:
https://trac.mcs.anl.gov/projects/mpich2/log/mpich2/tags/release/mpich2-1.1?action=follow_copy&rev=HEAD&stop_rev=813&mode=follow_copy
New in OpenPA v1.0.2:
Major Changes:
* Add support for 64-bit PPC.
* Static initializer macros for OPA types.
balaji (1):
* Fix pthread_mutex usage for inter-process shared memory regions.
buntinas (1):
* added OPA typedef for pthread_mutex_t
fortnern (4):
* Add more tests for compare-and-swap.
* Add integer compare-and-swap fairness test.
* Add pointer version of compare-and-swap fairness test.
* Added configure test for pthread_yield.
goodell (6):
* Fix bad include guard in the opa_by_lock.h header.
* Add new "unsafe" primitives. Also minor updates to the docs.
* Add support for 64-bit PPC.
* Update README to reflect 64-bit PPC support.
* Add static initializer macros for OPA_int_t/OPA_ptr_t.
* Actually include the COPYRIGHT and CHANGELOG files in the distribution.
jayesh (1):
* Fixed compiler warnings in NT intrinsics. Now type casting the arguments to NT intrinsics correctly
Grid Engine 6.2, which has undergone significant changes in qmaster to
significantly improve its scalability in challenging environments, adds
powerful features to the core system, introduces multi cluster support
for the Accounting and Reporting Console (ARCo) and comes with a new
module extending the scope of Grid Engine to a new domain of use cases:
the Service Domain Manager (SDM), aka. project Hedeby allows to
dynamically (re-)assign computational resources on demand.
plus lots of bug fixes.
This changes the buildlink3.mk files to use an include guard for the
recursive include. The use of BUILDLINK_DEPTH, BUILDLINK_DEPENDS,
BUILDLINK_PACKAGES and BUILDLINK_ORDER is handled by a single new
variable BUILDLINK_TREE. Each buildlink3.mk file adds a pair of
enter/exit marker, which can be used to reconstruct the tree and
to determine first level includes. Avoiding := for large variables
(BUILDLINK_ORDER) speeds up parse time as += has linear complexity.
The include guard reduces system time by avoiding reading files over and
over again. For complex packages this reduces both %user and %sys time to
half of the former time.
Changes since 1.0.7:
- Added support for MPI 2.1
- Added support for MPI_Type_create_resized and
MPI_Type_create_indexed_block datatypes in ROMIO.
- Bug fixes, memory leak fixes and code cleanup.
patch-au compiles sge_arch.c with -ansi so that stringification hack works
on NetBSD and FreeBSD and probably others. Otherwise architecture names
like nbsd-i386 turn into nbsd-1 - From the FreeBSD port.
Bugs fixed in SGE 6.1u5 since release 6.1u4
wrong documentation for upgrade 6.0u2 and higher to 6.1u2 and higher
Multiple loadsensor instances are trying to access the same temp load
file on AIX51
Validation of the Filter List in Simple Query builder fails
qhost -l h=<hostname> does not work
Numbers in error mail too large
use of the same pathes for input/output stream must be dealt with
DRMAA Java language binding segfaults on Session.exit() with sol-x86
binaries on AMD64
sgeexecd startup script shouldn't suppress error messages from sge_execd
binary
Advanced Query with wild card character * does not produce correct results.
'Infinity' must be rejected when specified in 'complex_values' or RQS
limits for consumables
Invalid qconf -mrqs crashes qmaster with segmentation fault
RQS: Line wrap of host list introduces syntax error
Row Limit in ARCo Simple Query builder cannot be empty
loadsensor does not work on AIX51
qhost -xml has wrong namespace
QMON: The help for Resource Quotas is not available
qmon fills in fields incorrectly for restoring config for Submit Job
sgemaster -qmaster stop shutdowns also shadowd
incorrect depencency on xinetd in init scripts for linux
Latebindings for Advanced Queries does not work
Switching from Simple Query to Advanced Query removes the Latebingings
32-bit Linux binaries are having problems with file access in 64-bit NFS
environments
using of default_domain may prohibit execd installation
Commlib might crash if running out of memory
Configuration file check of automatic installation does not recognize remsh
loadcheck prints error message "kstat finds too many cpus"
Communication library thread locking problem results in qmaster crash
ARCo should not print exception stack trace in the console
TABLESPACE values should be written to dbwriter.conf
Incorrectly considering two host group names to be the same
Clients not disconnecting correctly
SGE util/arch script is broken for AIX 5.3 Operating System
error message given by qalter -q '' <jobid> suggests a memory access problem
bootstrap(5) man page sees itself als sge_conf(5)
qmaster reinstall overwriting an existing installation fails
qconf -ae|-Ae return 0 even if exechost exists already
qconf -dxxxx does not set exit status on error
qconf -as, -am, -ao, -Ae, -Acal, -Ackpt, -Ap when msg "already exist"
should return not 0 exit code
qconf -acal doesn't return error code 1 when failed
setting of QMaster port number leads to infinite loop
use of -l tmpdir=abc can crash schedd
load scaling display not working correctly
qstat -j does not print array task information
job hold due to -hold_jid is not indicated as STATE_SYSTEM_ON_HOLD by
drmaa_job_ps(3)
Segmentation fault of sge_schedd
A load sensor reporting values for other hosts does not work
reporting file is lacking information about global consumables, if
log_consumables=false
Wallclock_Time query should be more constrained
"./install_execd -winsvc -auto /path/to/auto.conf" command causes error
The default has to be local spool directory when install_execd is run for
a Windows host
qmaster runs out of memory on AIX
dbw install parameters are not verified
Incorrect slots_total from qstat -F -xml output
Wrong permissions if install_qmaster creates qmaster spool directory
Installation of execution daemon left user unclear which port was chosen.
Exception occurs during the exportation of a query result to pdf
memory leak in sge_execd with qsub -v SGE_* or qsub -V
ARCo should support SJWC 3.1
Bugs fixed in SGE 6.1u4 since release 6.1u3
on Windows installation fails when installing as root and SGE admin
user = none
accounting records for slave tasks of pe jobs should contain the correct
task submission time
check if config parameters qlogin_daemon and rlogin_daemon are pathes
parallel scheduling memory leak in sge_schedd
execd installation does not test absolute path for local spool dir
Sort on table column throws exception if explicit SORT specified in
SimpleQuery Sort on table column
Error.jsp contains unbalanced tagError.jsp contains unbalanced tag
arco_read should be able to create synonyms instead of arco_write
DBW should use batch inserts
prolog an epilog descriptions should include exit codes
It is possible to negative tickets / shares in qmon and from the command
line
ORDER BY clause ignored in Advanced Query
Queue Consumables query incorrect in ARCo predefined queries
CLI accepts the slot number of more than 10000000
ARCo online help contains invalid, unclear or outdated information
the installation of two rpc databases on the some host fails
DBWriter should not exit if there is a database connection error
Reporting 'View' dropdown menu and 'Save Result' functionality is confusing
DBW derived rules and reporting queries that count jobs need to be updated
incomplete error loging in case of classic spooling failures
Row Limit in Simple Query uses wrong syntax
NONE' as value is not rejected for queue_conf(5) shell and qsub(1) -S
Upgrade to 6.1u3 fails for PostgreSQL < 8.0, minor issues i
dbdefinition.xml for PSQL > 8.0
dbwriter should write checkpoint to database
dbwriter deletion rules delete tasks of pe_jobs
unclear 'exit_status' description in accounting(5) about Grid Engine
specific status
autoinstall configfile should be parsed and checked for valid input!
qstat -j output is broken for shell_path
the project field should be displayed in the qstat -j output
Wrong variable for calculating daily host values from hourly ones
Pending PE job qstat -j output displays addtional useless message when not
running because of RQs
automatic backup is broken!
Spelling mistakes in the qmon help menus
deletion rule for PostgreSQL incorrect for deletion of sge_share_log
qquota broken if quota definition contains "hosts" or "users" scope negation
Access_list(5) man page not precise enough with regards secondary/primary
group(s)
RQS debitation of running jobs is broken if enabled by -mattr
Set SGE_QMASTER_PORT in settings file if sge_qmaster is not found in
/etc/services file
Failed to deliver STOP signal for subordinated jobs
Missing array job task usage in the accounting file
qhost/qstat can't be interrupted with ctrl-c
typographical errors in messages from install_qmaster
Sort order and row limit cannot be specified together in ARCo Simple Query
builder
Qmaster segfaults with long host resource evaluation expression
Error message for unsupported platforms should be more verbose
qsub does not accept resource strings size larger than 256
Memory leak in drmaa_run_job()/drmaa_run_bulk_job()
ARCo reporting module installation script is broken on Red Hat Enterprise
Linux 4 Update 4
Job predecessor list missing from qstat -j output
In SJWC on Oracle dates appear truncated to just MM/DD/YYYY
configfile check in automatic installation is to strict
load sensor might block execd port
Uninstallation of remote execd if not interactive
Infotext spawned on remote machine with -wait or -ask does not display the
text
Uninstall does not remove the SGE_STARTUP_SCRIPT
qmaster crashes when SGE_ND=1, dl 2 and BDB server spooling
inst_sge -ux all -um fails
Usage string for some commands is incomplete
dbwriter installation can't finish on large amount of data
reprioritize disappears after sge_qmaster restart
qmaster failover should not change the state of any queue
to trigger/signal a rebuild for the transition 5.8.8 -> 5.10.0.
The list of packages is computed by finding all packages which end
up having either of PERL5_USE_PACKLIST, BUILDLINK_API_DEPENDS.perl,
or PERL5_PACKLIST defined in their make setup (tested via
"make show-vars VARNAMES=...").
-------------------------------------------
Issue Sun BugId Description
-------- --------- ------------------------------------------------------------------------------------------
376 4743006 problem with floating point job resource limits
1909 6353628 information provided by qstat -j and qstat -j -xml are not equivalent
2076 6440408 qstat -j messages disagree between plain, XML output
2077 6440412 qstat -j -xml messages incomplete
2138 6506667 forbid deletion of global config values
2194 6527836 authuser binary returns unusable error message!
2249 6568575 SGE does not work if primary group entry is too big in groups map
2270 6575720 ENABLE_ADDGRP_KILL is missing from sge_conf(5)
2272 6575727 sge_shadowd(8) man page is missing some env vars
2274 6564461 Duplicate scheduling info messages for reservation jobs
2276 6575731 share_tree(5) doesn't explain type field
2283 6565821 Oracle, Postgres DWB should prompt for tablespace where indexes and tables should be created
2293 6569088 Resource reservation broken for sequential jobs depending on RQS specified for subset of queues only
2303 6571749 parallel resource reservation broken when non-queue instance based quotas limits apply
2323 6576153 Creating a userset with NONE as a type results in a core dump
2327 6578213 qconf -(A,D,M,R)attr dumps core when the supplied file is empty
2328 6579232 high scheduler dispatching time with many sequential resource reservation jobs and resource quotas
2336 6287501 rctemplates lack of requirement
2338 6585721 Parallel RR broken if jobs wait for queue slots and no RQS configured
2342 6590010 Original primary group vanishes after newgrp command (USE_QSUB_GID=true)
2344 6590079 Resource reservation broken with sequences of identical jobs differing only in their -R y|n
2346 6604155 qmon binary job submit is broken
2351 6597463 qsub -t 1-N:N creates a normal job with one task
2352 6594665 Installation fails on Linux with glibc 2.6
2353 6597423 commit method of UnixLoginModule does not report RuntimeExceptions
2356 6600619 Userset spooling in classic mode is broken
2367 6597547 qdel does not recognize wc_job_range_list as it is defined
2369 6577034 Several qconf options display only single message when a list of messages should be printed
2372 6469494 clients should issue a more explicit error message when qmaster is busy
2374 6589459 Expose the availability of keyword "none" in the manual page of calendar_conf
2382 6569862 Unset old_value out of the scope
2383 6553062 qconf -mc accepts erroneous resource entries without an urgency; qmon gives (poor) error message
2387 6614041 Multiple occurrence of a name in RQS limit definition break classic spooling
2392 6614108 Specifying more than one drmaa_v_env attribute causes spurious error msg
2394 6608259 scheduler prints empty line in messages file after every 'sge_mirror' logging
2396 6608236 scheduling of parallel jobs does not respect consumables, if consumable is referenced in rqs
2400 6564543 sge_shepherd should exit if it cannot write to any of its essential files
2401 6617450 add option to reporting_params for switching off writing of consumables
2404 6618328 qmon displays wrong string for queue filtering
2406 6596931 Incorrect messages in qconf command
2407 6618619 the restore feature does not delete old configuration before restoring
2409 6619016 removing parameters from the reporting_params will not fallback to the default
2410 6619657 qmod -e|-d '*' times out in large clusters
2411 6619662 qhost becomes sluggish in large clusters
2414 6618599 Long running jobs cause incorrect usage summary for ARCo database
2415 6620930 ARCO view_accounting filters out parallel job usage incorrectly
2416 6621482 ju_exit_status should provide means to recognize the intermediate record
2417 6622842 the start_time field in intermediate accounting records is incorrect
2418 6588743 qrsh fails with "connection refused" error message
2419 6391244 qstat -ext reports wrong usage as compared to other commands such as qstat -t or qstat -j
2424 6620253 During the installation the admin user should create web.xml file
2428 6630268 upgrade from 6.0u2 and higher to 6.1u2 and higher does not work
2435 6599335 inst_sge help output for -upd switch is incorrect
Bugs fixed in SGE 6.1u2 since release 6.1u1
-------------------------------------------
Issue Sun BugId Description
-------- --------- ------------------------------------------------------------------------------------------
- 6590960 Man pages show the wrong version number
2345 6590574 resource quota can prevent dispatching of jobs that requests no resource in this quota
2343 6589807 newline missing from "illegal debug level format" message
2338 6585721 Parallel RR broken if jobs wait for queue slots and no RQS configured
2334 6584632 user/system/operator hold state combinations cause strange qstat output
Bugs fixed in SGE 6.1u1 since release 6.1
-----------------------------------------
Issue Sun BugId Description
-------- --------- ------------------------------------------------------------------------------------------
2323 6576153 Creating a userset with NONE as a type results in a core dump
2317 6574565 Oralce, Postgres FOREIGN KEY fields need to be indexed
- 6573980 'qconf -help' suggests usage of patterns in user_list which is not true
2316 6573508 qrsh with ssh causes job to go in error state when Ctrl-C is pressed
2308 6572803 qhost -xml lacks '>' with initial qhost tag
2309 6572801 sge_queue_values definition does not contain PRIMARY KEY
2321 6571714 Inadequate error message when qconf -sstree is run when no share tree is configuredIf no share tree
2241 6568712 util/arch has problem recognizing libc version number with comma
2292 6568578 6.1 upgrade procedure shall exit when there are jobs in the cluster
2249 6568575 SGE does not work if primary group entry is too big in groups map
2284 6565841 Oracle: rollback segments keep filling up, Postgres: delete query keeps running
2306 6564592 SGE 6.1 upgrade procedure is broken when using the classic qmaster spooling
2275 6564503 sge_schedd deadlock upon schedd_job_info job_list being enabled
2250 6558006 qmaster may crash with projects or usersets used in RQS
2243 6555744 qmon crashes when displaying about dialog
2248 6554313 add -u <user> to scheduler category only if there is a resource quota for the user
2238 6551568 need faster resource quota matchmaking and more concise job info messages
- 6550718 qstat -j lacks resource quota info messages in case of "incomplete" resource quotas
2296 6548455 csp mode installation, using /etc/services, qmaster is not starting!
2232 6546807 qhost -j -xml does not work
2325 6542987 drmaa_run_job(3) raises error if drmaa_native_specification has leading spaces
2239 6542137 use of hostgroups in resource quotas is less performant than the full list of hosts
- 6541085 NFS write error on N1GE trace file
2300 6539199 qquota(1) filtering broken for project and pe if -P/-pe switch is not used
2299 6536039 sgeremoterun not working
2201 6529974 Use of MORE fails on some architectures
- 6528949 inst_sge -ux uninstallation of exechost tries to delete local spooldir, even it isn't configured!
2191 6525883 qstat -s hX filtering is broken on darwin
2189 6525375 qacct ignores jobs in output
2320 6513115 in qmon, under calendar configuration, it is possible to modify even if no calendar exists
2326 6506661 sge_conf(5), description for rlogin_daemon and qlogin_daemon is wrong
2307 6433628 qconf -sq all.q@myhost produces no value at all for complex_values (not even NONE)
2289 6565951 Qmon panel does not check for valid data in Scheduler Configuration
2314 6513116 Qmon x qconf inconsistent in allowed characters in attribute names
- 6195248 QMON Job Control Window: Incomprehensible Priority Button
2313 6410592 Double clicking in Consumables/Fixed Attributes list does not behave as a GUI should
2312 6482211 complex attributes whose deletion is denied donot reflect back after the denial message in qmon
2301 6551121 Memory leak in libdrmaa.so
916 6355875 qsub -terse to just output job id
- 6522273 Wrong exit code with qconf -sds
2266 6563346 Wrong usage of 'day' format model in trunc(date) Oracle functions
2187 6562190 memory leak in sge_schedd
2265 6280747 qmon loses sharetree changes
747 6291044 "Modify"-Button is activated but should be grayed
2263 6553066 qmon's Complex Configuration Load and Save buttons did not work
2262 4742097 Qmon has a ticket number limitation
1729 4818801 qmon on secondary screen crashes when "Job Control" is pressed
2261 6538740 clear usage operation should implicitely trigger refresh in share-tree dialogue
2260 6327539 Ability to sort queue instances using each column of the queue instances table
2229 6544869 UNKNOWN group/owner in accouting(5)
2247 6556411 DBW queries "Average Job Turnaround Time", "Average Job Wait Time" might not work
- 6481737 Arco should support webconsole 3.0.x
- 6559385 Calling JGDI getQueueInstanceSummary results in a memory leak
1813 6328064 Queue request -q from sge_request can't be overridden through command line
- 6355674 arcorun can not be used as sge_admin user if the toc file is not available
2164 6514085 Need a possibility to update existing example queries for the ARCo web application
- 6426331 remove util/sge_log_tee from distribution
- 6476263 function job_get_id_string() is not MT save and used in qmaster
2219 6536426 inst_sge -m fails for non-root when USER variable is not set
1860 6345522 qdel on a job in deleted state does not output any information
2258 5081743 queue status in reporting file is missing.
2050 6422335 still used usersets/project/calendar/pe/checkpoint can be removed under certain conditions
Bugs fixed in SGE 6.1 since release 6.1_beta
--------------------------------------------
Issue Sun BugId Description
-------- --------- ------------------------------------------------------------------------------------------
1941 5086007 qstat -qs doesn't work
2183 6499217 meaningless error in clients when reporting_param flush_time is incorrectly set
- 6525497 JGDI crashes JVM when null is passed to JNI GetStringUTFChars function
2220 6440226 add installation of SGE_Helper_Service to auto installation
2221 6521802 the binary check in inst_sge is wrong!
- 6537633 Extraneous space in qsub's "Invalid month specification." message
2222 6538293 Hybrid user/project share-tree is broken for user sharing amongst array jobs
2180 6518684 Qconf usage x man page inconsistency
2181 6518689 Project man page contains different attribute names.
2171 6516288 Scheduler does not write pid file in daemonize phase
2178 6518607 invalid memory access in cl_com_get_handle
- 6520761 add background mode to N1 Grid Engine Helper Service
- 6233523 loadcheck reports on a hyperthreaded CPU only one processor
- 6276612 provide support for Itanium platform
752 6288953 scalability issue with qdel and very large array jobs
751 6291047 qconf -sstnode cannot find root
- 6303750 Install guide ambiguous on role of CSP
1930 6329378 incorrect qsub error message, if an invalid integer value is passed to the -l option
1858 6344960 qtcsh behaves differently in direct mode from qrshmode
1933 6349037 "qstat -explain E" displays explanation of the same error two times.
1940 6362523 qstat -q filter does compare hosts in queue instances
- 6363245 on some Windows execution hosts, execd hangs after the job has finished
1978 6383256 no newline at end of sge_shepherd's exit_status messages
- 6395078 wrong entry in sgepasswd file wrongly sets whole host in error state
2012 6402127 qconf -suserl reports incorrect status if no users are defined
- 6403152 qconf -as returns error code 0 even in case of unresolvable host
2015 6403810 JavaDocs for DRMAA need improvement
- 6428621 add a reserved complex value to control displaying Windows GUIs
- 6453426 Event clients will not get list updates, when they change their subscription after the registration
- 6461308 Wrong path to spooled parallel jobs with using classic spooling
2130 6501447 No online usage for MacOS X
2141 6506701 sge_shepherd dumps core on linux amd64 for qrsh jobs with very long cmdline (> 10k)
2233 6528950 modifying a RQS with invalid syntax results in its deletion
- 6533952 Admins guide does not mention that parallel environments must be linked with queues
- 6535768 Upgrade chapter 5 in 6.1 install guide must mention abolition of LD_LIBRARY_PATH for Solaris/Linux
- 6535775 Upgrade chapter in 6.1 Install Guide wrongly indicates upgrade from 5.3 were possible
- 6537476 6.1 install guide broken and incomplete wrts MySQL installation for ARCO
- 6537607 6.1 Admins guide needs improvement on the linking between queues and parallel environments
- 6539215 quota verification time may not grow with the number of queues
2224 6539792 resource quotas broken after qmaster restart
- 6542483 Important changes with Resource Quota chapter in 6.1 admins guide
- 6545277 sge_statistic tables are not documented
2230 6546370 Pivot for ARCo Accounting Queries does not show all the fields
2231 6546802 qstat -F -xml does not show resources
Bugs fixed in SGE 6.1_beta since release 6.1_preview2
-----------------------------------------------------
Issue Sun BugId Description
-------- --------- ------------------------------------------------------------------------------------------
- 6267190 Typo before "About the urgent priority" in Admin Guide
1445 6291021 64 bit solaris BDB rpc server broken
1703 6295319 Admin guide: refers to sge_host(5) instead of host_conf(5)
- 6344917 Error in Embedded Command Line Options example
- 6395075 on Windows, execd doesn't provide useful error messages when SSL keys broken
2188 6421113 CSP mode auto installation: certificates are not copied to submit hosts
- 6444526 Admin guide describes N1GE backup facility, but restore is not described
2196 6472614 Auto installation option failed to save the install log
2182 6513433 remote installation of execd's need enhancement, rework, cleanup
2139 6506690 dbwriter should not use autocommit mode
- 6520257 need to define continuation character behaviour with qconf file formats
- 6521285 describe useful characters for every parameter
2185 6522385 qmon crash in cluster configuration dialog when modifying a host
2192 6525917 qacct -l h=<hostname> dumps core on darwin and linux itanium
2198 6528808 sge_ca script fails on nfs no root access file systems
2202 6530335 qmaster aborts when a resource quota set is modified while jobs are running
2204 6531317 qstat -xml does not show pending/zombie jobs
2206 6531921 qstat -r -xml is not working
2207 6533754 resource quota are modified on qconf -mrqs, even if the editor is exited without saving
Bugs fixed in SGE 6.1_preview2 since release 6.1_preview1
---------------------------------------------------------
Issue Sun BugId Description
-------- --------- ------------------------------------------------------------------------------------------
- 5093930 ARCo should work with MySQL
- 5101053 Regular expressions should also be mentioned in qsub in addition to complex
- 5101735 Needs more boolean operators support for resource requests
56 6205203 Logical OR operator works only with complex attributes of type RESTRING
2135 6506115 Invalid qconf -mattr crashes qmaster
2150 6507572 qconf -Arqs added invalid RQS
2146 6510635 Default requests for complexes not honored by resource quotas
2161 6513944 qmaster core dump with usersets referenced in RQS
2162 6513967 unix groups are not considered by RQS
2166 6515122 add -wd working_dir in addition to -cwd option for submission
1.4.0-pre1
----------
2005-07-16 Ernst Rohlicek jun. <ernst.rohlicek@inode.at>
* Pvm.xs, Pvm.pm: Finished adding functionality of PVM v3.4 - siblings,
contexts and message boxes with their according new contants. All added
functionality tested.
Message handlers still missing. Testing routines for inclusion
in package still missing.
NOTE: Also created an ebuild (installation script) for the Gentoo
distribution.
MPICH2 is an all-new implementation of MPI from the group at Argonne
National Laboratory. It shares many goals with the original MPICH but
no actual code. It is a portable, high-performance implementation of
the entire MPI-2 standard. This release has all MPI-2 functions and
features required by the standard with the exception of support for the
"external32" portable I/O format.
Major enhancements:
A new program, dtop, allows the user to run top across multiple machines
and collate the results in real-time. The default remote commands have
been changed over to ssh from rsh. A test option has been added to all
commands to check if SSH is up and running before attempting an SSH
connection that might otherwise hang. A flag has been added to dsh
allowing the user to copy, execute, and delete a script on all machines
in one step. There are many other small bugfixes and enhancements.