Notable User Facing Changes
---------------------------
- support for LLVM 13
- CMake: Inter-Procedural Optimization is enabled on code of runtime library
(libpocl.so is compiled with -flto on systems that support it).
- LTTng tracing improved - more command types are traced, and also
some synchronous API calls (like clCreateBuffer) are traced.
- poclcc, tests and examples can be disabled with CMake options
- Valgrind support improved by making Valgrind aware of pocl's
reference counting of cl_* objects
- kernels which are called by kernels are now force-inlined
- Support for NetBSD.
- Support for Unix systems without libdl.
- PoCL can now (optionally) respond to SIGUSR2 by printing
some live debug information.
- improved SPIR support for CUDA devices
Notable Bug Fixes
-----------------
- Fixed a potential crash on Unix systems without sysfs mounted.
- Fixed compilation errors when building on macOS.
- Fixed POCL_FAST_INIT macro; POCL_INIT_LOCK must be invoked with only one argument.
- Fix bin/poclcc to not depend on OpenCL 2.0 symbols
- Fixed miscompilation in kernel loops with multiple conditionals with barriers in them.
Other
-----
- Add cmake options PARALLEL_COMPILE_JOBS, PARALLEL_LINK_JOBS to
use ninja's seperate compile and link job pools.
- Improve memory architecture, buffer migration and allocation.
Buffers are now allocated on a device when first used
(previously each buffer was allocated on every device in context).
- the single global LLVMContext was replaced with
multiple LLVMContexts, one per OpenCL cl_context.
OpenCL code can now be compiled in parallel
when using separate cl_contexts. This feature
is disabled by default since it significantly slowed
down PyOpenCL. This should be resolved by separating
LLVM compilation in their own threads in the future.
- a new OpenCL extension was added to PoCL: cl_pocl_content_size.
The extension allows the user to give optimization hint to PoCL,
which will be used internally by PoCL to optimize buffer transfers
between multiple devices.
OpenCL (Open Computing Language) is an open, royalty-free standard for
cross-platform, parallel programming of diverse accelerators found in
supercomputers, cloud servers, personal computers, mobile devices and embedded
platforms.
PoCL is a portable open source (MIT-licensed) implementation of the OpenCL
standard (1.2 with some 2.0 features supported). In addition to being an easily
portable multi-device (truely heterogeneous) open-source OpenCL implementation,
a major goal of this project is improving interoperability of diversity of
OpenCL-capable devices by integrating them to a single centrally orchestrated
platform.