Themes of this release/New features
- Changed Unicode escapes in strings (bobhy)
WARNING
Breaking Change: You need to update escapes like "\u0043" to "\u{0043}"
This format allows you to insert any Unicode code point into a string by
specifying its value as 1 through 6 hex digits (with or without leading
zeros, upper or lower case). The maximum value is \u{10ffff}, which is the
largest Unicode code point defined.
We've simply dropped support for the old format since we're pre-1.0 and
didn't want to carry forward redundant syntax. You will have to change any
unicode escapes in your scripts to the new format.
Why change? The old 4-digit syntax could not natively address recent
extensions to Unicode standard, such as emoji, CJK extension and traditional
scripts. There is a cumbersome workaround in the form of surrogate pairs,
but this is not intuitive.
Why this change? The new format allows you to specify any Unicode code point
with a single, predictable syntax. Rust and ECMAScript 6 support the same
syntax. (Full disclosure: C++, Python and Java don't.)
- -g grapheme cluster flags for str length, str substring, str index-of, split
words and split chars (webbedspace)
As you know, str length, str substring, str index-of and split words measure
the length of strings and substrings in UTF-8 bytes, which is often very
unintuitive - all non-ASCII characters are of length 2 or more, and splitting
a non-ASCII character can create garbage characters as a result.
A much better alternative is to measure the length in extended grapheme
clusters. In Unicode, a "grapheme cluster" tries to map as closely as
possible to a single visible character. This means, among many other things:
- Non-ASCII characters, such as ん, are considered single units of length 1,
no matter how many UTF-8 bytes they use.
- Combined characters, such as e and ◌́ being combined to produce é, are
considered single units of length 1.
- Emojis, including combined emojis such as 🇯🇵, which is made of the 🇯and
emojis plus a zero-width joiner, are considered single units of length 1.
(This is a property of "extended" grapheme clusters.)
- "\r\n" is considered a single unit of length 1.
The new --graphemes/-g flag can be used with str length, str substring, str
index-of and split words to enable these length/indexing measurements.
In addition, the flag has been added to split chars. Notably, this command
splits on Unicode code points rather than UTF-8 bytes, so it doesn't have
the issue of turning non-ASCII characters into garbage characters.
However, combining emoji and combining characters do not correspond to
single code points, and are split by split chars. The -g flag keeps those
characters intact.
These commands also have --utf-8-bytes/-b flags which enable the legacy
behavior (and split chars has --code-points/-c). These currently do not do
anything and need not strictly be used, since UTF-8 byte lengths are still
the default behaviour. However, if this default someday changes, then these
flags will guarantee that the legacy behaviour is used.
- New enumerate command (JT)
A new enumerate command will enumerate the input, and add an index and item
record for each item. The index is the number of the item in the input
stream, and item is the original value of the item.
Rather than relying on the --numbered flags of commands like each, with the
enumerate command we take more modular and composable approach than
hard-coding flags to our commands. (Note: The --numbered flags have not
been removed yet.)
- Breaking changes to the web-related commands (Kazuki-Ya,VincenzoCarlino)
We decided to move some of the important commmand for interacting with HTTP
under their own http subcommands for better discoverability. The common
fetch command is now http get.
- main command exported from module defines top-level module command (kubouch)
Defining and exporting a maincommand from a module allows creating a command
with the same name as the module.
The same thing works overlay use as well. Note that the main command
continues to work the same way as before when running a script.
Combined with a recent bugfix, this feature allows for nicer way of defining
known externals and custom completions.
It is also a stepping stone towards being able to handle directories which
in turn is a stepping stone towards having proper Nushell packages.
- Progress bar for save command (Xoffio)
To watch the progress when saving large files you can now pass the
--progress flag to save. It gives information about the throughput and an
interactive progress bar if available.
Breaking changes
- Unicode escapes in strings now use and extended format \u{X...}, any
scripts using the old syntax \uXXXX will have to be updated. See also #7883.
- The to url command has been renamed and moved to url build-query as this
better reflects is role as a nushell specific url command compared to a
conversion. (#7702)
- fetch has been renamed to http get (#7796)
- post has been renamed to http post (#7796)
- Quotes are trimmed when escaping to cmd.exe (#7740)
- parse -r now uses zero-indexed rows and uncapitalized columns (#7897)
- last, skip, drop, take until, take while, skip until, skip while, where,
reverse, shuffle, append, prepend and sort-by raise error when given
non-lists (#7623)
- to csv and to tsv now throw error on unsupported inputs (7850)