38dc6152c6
Notable User Facing Changes --------------------------- - support for LLVM 13 - CMake: Inter-Procedural Optimization is enabled on code of runtime library (libpocl.so is compiled with -flto on systems that support it). - LTTng tracing improved - more command types are traced, and also some synchronous API calls (like clCreateBuffer) are traced. - poclcc, tests and examples can be disabled with CMake options - Valgrind support improved by making Valgrind aware of pocl's reference counting of cl_* objects - kernels which are called by kernels are now force-inlined - Support for NetBSD. - Support for Unix systems without libdl. - PoCL can now (optionally) respond to SIGUSR2 by printing some live debug information. - improved SPIR support for CUDA devices Notable Bug Fixes ----------------- - Fixed a potential crash on Unix systems without sysfs mounted. - Fixed compilation errors when building on macOS. - Fixed POCL_FAST_INIT macro; POCL_INIT_LOCK must be invoked with only one argument. - Fix bin/poclcc to not depend on OpenCL 2.0 symbols - Fixed miscompilation in kernel loops with multiple conditionals with barriers in them. Other ----- - Add cmake options PARALLEL_COMPILE_JOBS, PARALLEL_LINK_JOBS to use ninja's seperate compile and link job pools. - Improve memory architecture, buffer migration and allocation. Buffers are now allocated on a device when first used (previously each buffer was allocated on every device in context). - the single global LLVMContext was replaced with multiple LLVMContexts, one per OpenCL cl_context. OpenCL code can now be compiled in parallel when using separate cl_contexts. This feature is disabled by default since it significantly slowed down PyOpenCL. This should be resolved by separating LLVM compilation in their own threads in the future. - a new OpenCL extension was added to PoCL: cl_pocl_content_size. The extension allows the user to give optimization hint to PoCL, which will be used internally by PoCL to optimize buffer transfers between multiple devices. |
||
---|---|---|
.. | ||
clusterit | ||
dqs | ||
dsh | ||
fastflow | ||
ganglia-monitor-core | ||
gridscheduler | ||
hwloc | ||
linda | ||
lua-lanes | ||
mpi-ch | ||
ocl-icd | ||
opencl-clang | ||
opencl-clhpp | ||
opencl-headers | ||
openmp | ||
openmpi | ||
openpa | ||
p5-Parallel-Pvm | ||
paexec | ||
parallel | ||
pdsh | ||
pocl | ||
pvm3 | ||
py-billiard | ||
R-promises | ||
sge | ||
slurm-wlm | ||
spirv-llvm-translator | ||
threadingbuildingblocks | ||
Makefile |