It's possible that we won't find a thread to wake up in a
single pass through the worker threads. Now loop until we
do find a worker thread to wake up.
Disabled a CHECK() for the thread that the schedule_on() operator
resumes on since the outcome is actually not guaranteed with the
current implementation of task<T>. See #79 for details.
MSVC was inlining local variables in await_suspend() into the
coroutine frame of the caller which was breaking the guarantees
required by the for_each_async() test.
This allows parallel execution of a bulk operation using only
a single coroutine frame. The continuation of the loop is eligible
to be stolen by other threads while executing the current iteration.
- Worker threads now spin for a short while before putting
themselves to sleep. This reduces the overhead for enqueueing
items as the enqueuing thread doesn't have to call into the OS
to wake up the thread so often. It should also improve the
responsiveness of worker threads.
- Keep track of the number of sleeping threads in an atomic integer
so that enqueueing threads only need to look in one place to check
whether any threads need to be woken rather than scanning the
thread-state of each worker thread.
- Make m_globalQueueHead atomic so that worker threads that are
spinning waiting for new work can perform an approximate check
for new work without needing to acquire the mutex lock.
- thread_state::m_mask was being incorrectly initialised
- Modified try_local_enqueue() to attempt to enlarge the local
queue if it looks like it's run out of space.
- Removed doubly-linked list from schedule_operation as it wasn't
needed.
- Catch any potential exceptions thrown by auto_reset_event::wait()
and just sleep for 1ms in this case rather than terminating the
thread.
Don't write to flag until it looks like the lock has become
available. This should reduce the cache-line unnecessarily
jumping between cores when under contention.
The msvcurt[d].lib is actually intended for C++/CLI usage and
seems to no longer being bundled with the default VS install
in more recent Visual Studio 2017 installs.
Should fix#73
Factors out implementation of calls to OS that was duplicated in
both cancellable and non-cancellable operation objects into an
'impl' class that can be used by both.
Also, fixed potential data race in some try_start() methods when
trying to read the m_skipCompletionOnSuccess member of a socket
after starting the I/O operation. Now ensure that this flag is
read before starting the operation.
Simplified logic for file_[read/write]_operation slightly by
passing out-parameter to receive number of bytes read/written
if operation completes synchronously. This avoids an extra call
to GetOverlappedResult() to retrieve this information.