Files like Makefile.am and configure.ac are usually not used during a
build, therefore there's no point in checking these for shell portability
issues.
A commonly occuring scenario is that a package patches the configure
script, but that the corresponding configure.in contains shell code that
is not portable. In cases like these, configure.in is typically not used
during the build, therefore there is no need to check it for portability.
This also applies to all other combinations where a file is patched and
the corresponding file.in contains unportable shell code.
Having quotes around the sed commands does not change their meaning.
These quotes must not lead to syntax errors when parsing the shell
command. This happened in mk/subst.mk r1.91 because the double quote was
accidentally escaped.
The escaping inside the backticks had been wrong. Because of this,
parentheses and semicolons were interpreted as shell syntax.
Switching to $(...) command substitution removes the need for quoting
some of the characters and makes the whole command simpler to understand.
Doing the escaping for the backticks command properly would have involved
lots of special cases.
The $(...) command substitution was used sparingly in pkgsrc up to now
because some older or broken shells do not support it. Since these
shells do not end up as the shell that runs the commands from Makefiles,
that's not a problem.
Right now, the tests in regress/tools are a mixture of testing the pkgsrc
infrastructure in mk/tools and the tools provided by the platforms.
These don't go well together, therefore the tests for the platform tools
will be migrated here.
ShellCheck complained about a parse error in the empty functions, which
agrees with the shell grammar given in IEEE 1003.1, both the 2004 Edition
and the 2018 Edition.
To work properly, the $(...) should have been $$(...).
In pkgsrc the command substitution is usually done via `backticks`, for
compatibility with /bin/sh from Solaris. To fix the shell parse errors,
the special characters are properly escaped inside the command
substitution.
Since SUBST_FILTER_CMD is a shell command, it may contain arbitrary
characters. The condition in mk/subst.mk that tested whether
SUBST_FILTER_CMD was the default filter command was evaluated at run
time. In such an evaluation, the variables (lhs and rhs) are fully
expanded before parsing the condition. This means that these variables
must not contain quotes or unquoted condition operators.
Exactly this situation came up in one of the regression tests. The
quoted "0-9" was copied verbatimly into the condition, including the
quotes. The resulting condition was:
"tr -d "0-9"" == "LC_ALL=C /usr/bin/sed "
This produced a syntax error because of the left-hand side. Adding a :Q
modifier would have helped for the left-hand side, but this would have
been necessary for the right-hand side as well. Since an empty SUBST_SED
is defined not to "contain only identity substitutions", the first
condition can simply be removed.
The whole condition in the shell program had not worked anyway since it
expanded to either "[ true ]" or to "[ false ]", and both of these
commands exited successfully.
There are several cases where patterns like s|man|${PKGMANDIR}| appear in
SUBST_SED. Up to now, these had been categorized as no-ops and required
extra code to make the package build when SUBST_NOOP_OK was set to "no".
This was against the original intention of SUBST_NOOP_OK, which was to
find outdated substitution patterns that do not occur in SUBST_FILES
anymore, most often because the packages have been updated since.
The identity substitutions do appear in the files, they just don't change
them. Typical cases are for PKGMANDIR, DEVOSSAUDIO, PREFIX, and these
variables may well be different in another pkgsrc setup. These patterns
are therefore excluded from the SUBST_NOOP_OK check.
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
says in section 9.3.2 BRE Ordinary Characters that only very few
characters may be preceded with a backslash.
As a side effect, this change allows parentheses in the variable names
listed in SUBST_VARS (even if that will never happen in practice).
The reason that the regression test had not replaced VAR.[] before was
simply that this variable had not been listed in SUBST_VARS.
The documentation now starts with a high-level introduction instead of
listing only the details.
The name of the function test_file had to be changed since the old name
was not expressive enough. Same for the variable real_pkgsrcdir.
These tests demonstrate that it is not easy to exclude only one top-level
directory from being extracted, using the example of lang/gcc*, which has
a top-level directory contrib/ that contains shell programs with
non-portable code, but the same archive also contains libjava/contrib, and
that should still be extracted.
The severity now depends only on the setting of SUBST_NOOP_OK. Right now
this means that some former warnings will be reported as info only, but
that will change after switching the default of SUBST_NOOP_OK after
2020Q1. Then they will all be reported as warnings, followed by the final
error saying that the pattern has no effect.
This change makes it easier to detect inconsistencies and outdated
definitions, for example by setting the global SUBST_NOOP_OK=no and
redefining WARNING_MSG to actuall fail.
In the old test code, the input and output data for each test case were
in different files. This was too far apart to relate them. In addition,
all test cases were merged into a single big test, which made it hard to
tell the topics apart.