7.3 KiB
layout | title | date | categories | tags | summary | thumbnail | offset | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
post | Experimenting with Worker Threads | 2020-04-15 08:01:23-0400 |
|
🧵 Multi-threaded code in JavaScript...sort of | /blog/assets/2744507570_ba4ba165f4_k.png | -40% |
As I have started working on Uxuyu, one of the key technical aspects is making a large number of web requests. Since I'm prototyping this using Proton Native and JavaScript is traditionally single-threaded, this presents a small problem; since web calls can take a while, traditional programming techniques would lock up the user interface, which isn't really viable.
Fortunately, as of Node.js v10.5.0, JavaScript on the desktop (such as Proton Native) has what they call worker threads, an approach to forcing JavaScript to perform multiple tasks at (approximately) the same time.
So, these are some quick notes on getting worker threads...well, working. It was easy enough, but there are some points where it's unclear what is supposed to happen, with "minimal example" code all having strange and unnecessary features. As usual for the techtips posts, I'm mostly writing this for my own records for the next time I need to use worker threads and get frustrated by the existing documentation, but if it helps someone else, that's great, too.
Threads, in General
Originally, Sun Microsystems created what they called "light-weight processes," a system where multiple code-paths can run in parallel within the same program or processes. As other languages implemented similar approaches, the term evolved into "threads."
If multiple threads are run under the same process, this typically gives benefits over a multi-process approach with interprocess communication, since most of the system state can be shared, saving overhead on context switches and thread creation. If you haven't taken an operating systems course and don't recognize those terms, they basically boil down to not needing to keep pausing and restarting programs, since everything should be running from the same package.
Generally speaking, threads have a handful of common operations:
- Create sets up the new thread and assigns it a workload and initial data to work with.
- Exit ends the thread from the inside, leaving the data to be harvested by the main program.
- Join takes the data from the ended thread to make it available to the main program.
That's not the entire model, of course. There are a lot of utility features allowing the programmer to set different parameters and retrieve information, but the core process is create-exit-join.
Worker Threads
Node's worker threads...aren't that.
In some ways, it makes sense. The standard approach to threading goes back to the early 1990s, and it's now almost thirty years later, so maybe we've learned some things that make life easier. And then again...well, we'll see.
Thread Creation
We launch a thread almost normally, but with a twist that makes me extremely suspicious about how this all works under the covers.
const { Worker } = require('worker_threads');
const worker = new Worker(
'./workercode.js',
{
workerData: someObjectWithInitialData,
}
);
Typically, threads are given functions to run. Worker threads are different, though, taking a file. This is where suspicion starts to come in, as sending execution to a separate file implies that the thread is a separate program, rather than a single program sharing state.
Thread Handlers
The worker thread has three events we can chose to handle.
worker.on('message', this.acceptUpdate);
worker.on('error', this.reportUpdateError);
worker.on('exit', this.reportExit);
Each handler function takes a single parameter. The message can be an arbitrary object. The error is a JavaScript Error
object. The exit code is an integer.
There is also an online handler, announcing when the thread has started execution, taking no parameters, if that's useful to you.
Returning Data
Worker threads don't really exit and join, though I suppose an exit value could be used to simulate that. Instead, the thread takes its initial state from a default workerData
variable (imported from the worker_threads
library) and sends messages back to the main thread.
const {
parentPort,
workerData,
} = require('worker_threads');
parentPort.postMessage(someObjectWithResults);
The message handler (acceptUpdate()
, in the example above) then receives a copy of someObjectWithResults
.
This also works in the opposite direction, with the main thread sending messages to the worker.
worker.postMessage(updateForTheThread);
These are surprising improvements over traditional threading libraries, because it allows the thread to easily send and receive updates whenever it gets them instead of waiting until it's out of work to return everything it has collected or messing around in shared memory. However, this still smells of running in a separate process, basically treating the thread as a peer to coordinate with across a network connection or a special kind of shared file called a "pipe" that I won't bother to discuss, here.
Join
All that said, we still get a traditional join operation, where the main thread can harvest data from the worker.
worker.getHeapSnapshot();
This call fails unless the thread has exited, meaning that it's best run in the exit handler (reportExit()
, in the example above), and makes the worker threads feel less like a separate process.
Going Further
So, after all that, I'm still not 100% convinced that worker threads are actually threads, but they seem to mostly do the job and that's mostly what matters.
There's actually a lot more available, here, too. The threads can communicate through console I/O. A thread can set up additional communications channels, which can be passed to the parent for another thread, allowing two worker threads to communicate directly. Ports (endpoints to a communications channel) can be manipulated to prevent the thread from exiting, and so forth.
Like I said, though, we have our basic create-exit-join model plus communication back and forth, which is fairly useful for a lot of kinds of work. If they're not "really" threads, it doesn't matter much, as long as the code doesn't block and they basically act like threads.
Credits: The header image is Threads by Dave Gingrich and made available under the terms of the Creative Commons Attribution Share-Alike 2.0 Generic license.