doFilter() gets extended to take (<full name>, <vcard>) tuples in
addition to the full name alone. Then this is used to create one large
vCard that is suitable for testing (field content unique, all fields
set, etc.) with various filters. All field tests are covered with at
least one positive and one negative case.
The operation is a runtime parameter of different classes, whereas
extracting the right values to compare via the operation is hard-coded
at compile time. This is a rather arbitrary compromise between code
duplication, simplicity and performance (which, in fact, was not
measured at all).
The code for selecting case-sensitivity and the normalization before
the string operations is shared with the older 'any-contains'
operation.
The new TestContacts.testFilterLogic uses the same infrastructure as
the language tests and covers some combinations of 'and' and 'or'.
In some cases, Python lists had to be avoided in favor of tuples,
because Python cannot automatically map the lists to a 'av' array of
variants, while sending a tuple enumerating its types can be sent and
gets accepted by PIM Manager.
This changes the signature of the filter parameter in Search(),
RefineSearch() and ReplaceSearch() from 'as' (array of strings) to
'av' (array of variants). Allowed entries in the variant are arrays
containing strings and/or other such arrays (recursive!).
A single string as value is not supported, which is the reason why
'av' instead of just plain 'v' makes sense (mismatches can already be
found in the sender, potentially at compile time). It also helps
Python choose the right type when asked to send an empty list. When
just using 'v', Python cannot decide automatically.
Error messages include a backtrace of all terms that the current,
faulty term was included in. That helps to locate the error in a
potentially larger filter.
The scope of supported searches has not changed (yet).
The signature of the new type is just a 'v' for variant. Recursion is
only supported in combination with std::vector. This is done to keep
implementation simpler. std::vector was chosen over other collections
because it is easier to work with in method calls, the main purpose
of the new type.
The conversion from D-Bus messages accepts anything which can be mapped
into the variant: arrays, tuples, plain types, and of course variants
containing those. This helps with backward compatibility (one can
turn an interface which took a fixed type like std::vector<std::string>
into something with a recursive variant) and with Python, because
Python sends arrays and tuples without converting into variants
when the app uses those simpler types.
One caveat exists with sending an empty list in Python: when using 'v'
as signature in the interface, Python doesn't know how to send the
empty list without assistance by the programmer (as in dbus.Array([],
signature='s'). If possible, avoid the plain 'v' in an interface in
favor of something like 'av' (= std::vector<boost::variant>).
There is a difference in EDS between removing the database definition
from the ESourceRegistry (which makes the data unaccessible via EDS)
and removing the actual database. EDS itself only removes the definition
and leaves the data around to be garbage-collected eventually. This is
not what we want for the PIM Manager API; the API makes a stronger
guarantee that data is really gone.
Fixed by introducing a new mode flag for the deleteDatabase() method
and deleting the directory of the source directly in the EDS backend,
if requested by the caller.
The syncevolution command line tool will use the default mode and thus
keep the data around, while the PIM Manager forces the removal of
data.
Instead of hard-coding a specific "Backend Summary Setup" in
SyncEvolution, copy the config of the system database. That way
special flags (like the desired "Backend Summary Setup" for local
address books) can be set on a system-wide basis and without having to
modify or configure SyncEvolution.
Because EDS has no APIs to clone an ESource or turn a .source file
into a new ESource, SyncEvolution has to resort to manipulating and
creating the keyfile directly.
When an asynchronous method call failed with an exception, the code
DBusResult got destructed first before handling the exception and then
sent a generic, uninformative error back to the caller. A test
demonstrating that for the PIM Manager SetPeer() with an invalid UID
be committed separately.
To fix this, the DBusResult must know whether it is really responsible
for replying to the method message. The method call handlers
themselves do not know whether any of their parameters is a
DBusResult, so we have to activate the automatic reply indirectly, via
a guard instance with the same life time as the handler.
When the guard destructs, it can check whether the handler has
replied, because the handler then clears the message pointers. If it
hasn't, the DBusResult becomes the owner of the message. For this to
work, an extra try/catch block is necessary. Otherwise exception
handling would still occur too late, at a time when the guard is gone
and the DBusResult has sent the generic error.
Some other classes had to be adapted to allow for this extra level of
indirection. While doing so, another bug was found.
Retrieving the interface and member used when calling a method
returned the path instead, due to a cut-and-paste error. Fixed. Not
used in practice.
Having to look for XDG config files in two different locations was
confusing. The XDG_DATA_HOME=temp-testpim/local/cache was also wrong,
because "cache" is for XDG_CACHE_HOME.
Now XDG_CONFIG/CACHE/DATA_HOME must be set to
temp-testpim/[config|cache|data], which is consistent with the usage
of private XDG dirs in test-dbus.py, and the directories are shared by
EDS and PIM Manager.
EDS keeps running throughout all tests. This must be taken into
account when cleaning temp-testpim. evolution-source-registry does not
cope with removal of its "sources" directory, so we have to keep that,
although we are removing its content. evolution-addressbook-factory
may continue to have an ESource open although its client *and* the
source are gone (should be fixed in EDS by tracking clients *and*
source removals!).
When that happens and we reuse a DB UUID,
evolution-addressbook-factory continues to use the deleted instance,
leading to IO errors. To avoid that inside a test run, we use
different UID prefices for each test, by embedding the test name. To
avoid it between test runs, kill evolution-addressbook-factory
first. When killing evolution-source-registry, use -9, because
otherwise it shuts down too slowly, which may lead to test failures
when it talks to the closing instance.
For the system address book we can't change the UID between tests, so
we must not delete it.
As with glib before, now GeeCollCXX also makes the parameter for declaring the
pointer ownership mandatory. The ensuing code analysis showed that in two
cases involving gee_multi_map_get() the transfer of ownership was not handled
correctly, leading to memory leaks.
Traditionally, glib messages were caught via LogRedirect.
Later, they got redirected by installing a handler. That change
disabled the message filtering for known harmless error messages
in LogRedirect, which caused messages from folks to show up
in TestCmdline D-Bus tests randomly, depending on the timing.
Reestablish the filtering by checking it in the glog callback.
To minimize changes, keep the error registry in LogRedirect.
It must be thread-safe now.
The D-Bus method reply may occur when the calling instance of
LocalTransportAgent is already gone. Therefore bind to a weak
pointer instead of this.
Seen when testing against Apple Calendar Server in the nightly
testing. It was a bit uncertain from the logs why it happened,
but it is clear that it can happen and thus the patch makes
sense even though it might not address the real problem.
Explicitly free the engine before releasing our lock on shutdown
signals. This prevents getting killed by a signal while we are still
freeing resources, which can take some time when a background thread
is involved (seen in some tests covering that).
The D-Bus TestHTTP tests occasionally failed because a message was resend
due to rounding issues. Be a bit (= 0.1 seconds) more conservative about
resending to avoid this.
Sometimes an extra "quiescent" signal was seen. It came from the
FolksAggregator before reporting any contacts. Not exactly sure
why that happens, but because it is not the responsibility of this
test to detect that, let's ignore it here.
We have to hard code this instead of always using it because ICU does
not properly fall back to non-phonebook defaults. Without phonebook as
collation, sorting was not done correctly in Germany.
Now also pay attention to case and punctuation. Previously these were
ignored for performance reasons. The reasoning that case doesn't
matter was wrong, it is useful as a tie breaker (which is what the
higher level does).
Full interleaving of Pinyin transliterations of Chinese names with
Western names can be done by doing an explicit Pinyin transliteration
as part of computing the sort keys.
This is done using ICU's Transliteration("Han-Latin"), which we have
to call directly because boost::locale does not expose that API.
We hard-code this behavior for all "zh" languages (as identified by
boost::locale), because by default, ICU would sort Pinyin separately
from Western names when using the "pinyin" collation.
The SyncPeer() result is derived from the sync statistics. To have
them available, the "sync done" signal must include the SyncReport.
Start and end of a sync could already be detected; "modified" signals
while a sync runs depends on a new signal inside the SyncContext when
switching from one cycle to the next and at the end of the last one.
Following the boost::instrusive_ptr example and making "add_ref =
true" the default in our CXX GLib and GObject wrappers led to some
memory leaks because it didn't enforce thinking about whether the
plain pointer is already owned by us or not.
It is better to use a mandatory enum value, ADD_REF and TRANSFER_REF,
and force explicit construction. Doing that revealed that the
assignment operator was implemented as constructing a CXX instance
with increased ref count and/or that in some places, a real leak was
caused by increasing the ref count unnecessarily.
Running under valgrind gave a false sense of security. Some of the
real leaks only showed up randomly in tests.
This mimics the behavior of boost smart pointers and is useful for
code where variables can be of either type, because "= NULL" is
not supported by our boost::intrusive_ptr wrapper.
Avoid referencing pathProps->second when the set of paths that
PROPFINDs returns is empty. Apparently this can happen in combination
with Calypso.
The stack backtrace sent via email looked like this:
Program received signal SIGSEGV, Segmentation fault.
0x4031a1a0 in std::_Rb_tree<std::string, std::pair<std::string const, std::string>, std::_Select1st<std::pair<std::string const, std::string> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >::find(std::string const&) const () from /usr/lib/libsyncevolution.so.0
0x4031a1a0 <_ZNKSt8_Rb_treeISsSt4pairIKSsSsESt10_Select1stIS2_ESt4lessISsESaIS2_EE4findERS1_+60>: ldr r4, [r0, #-12]
(gdb) bt
from /usr/lib/syncevolution/backends/syncdav.so
from /usr/lib/syncevolution/backends/syncdav.so
from /usr/lib/libsyncevolution.so.0
Include the pid of the child process in all messages relating to it,
to help in cases where the same program is started multiple times.
Log something in the parent when destroying the fake API instance and
thus canceling the fake method call from the child, because that will
trigger a "lost connection" message in the child which was hard to
correlate with events in the server without those additional log
messages in the parent.
Sessions created via the Server.Connect() API were announced via
signals, but not activated. Therefore querying their status or
aborting them via D-Bus was not possible. Found while writing the
TestHTTP.testAbortThread D-Bus test.
Since the introduction of SuspendFlags, the ony remaining user
of the virtual aspect of checkForSuspend/checkForAbort() was the
testing code. By using the suspend/abort request mechanism in
SuspendFlags, it becomes possible to move checkForSuspend/Abort()
into SuspendFlags itself.
This will be useful to use it outside of a SyncContext member.
test-dbus.py now knows how to start syncevo-http-server. Therefore
it can test normal HTTP-based syncing as well as several scenarios
which fail or succeed with a slow server depending on the server's
ability to send SyncML messages while still initializing the storage.
To make the server slow, env variables are checked by the file
backend. It may matter whether open() or listAll() are slow, so test
both.
The tests expecting the 2 minute default must check whether the
feature is enabled at all in the binary that they are testing. If not,
the test cannot run. All other tests work, albeit somewhat unsafely
because they force the engine to run multithreaded when the engine was
compiled without mutex locking of global data structures.
We need to keep the glib event loop running while the main thread
waits for the background thread to finish. Otherwise code using
glib events (like EDS) may get stuck (observed with Sleep()
when using g_timeout_add() and synchronous calls in EDS 3.6).
We also need to watch for abort requests. When aborted, we give up
waiting for the thread and tell the engine to proceed. This will
eventually return control to us, where we can shut down.
When libsynthesis shuts down, it'll wait forever for the background
thread's shutdown. In this case we must wait and hope that the background
thread completes or aborts by itself.
HTTP SyncML clients give up after a certain timeout (SyncEvolution
after RetryDuration = 5 minutes by default, Nokia e51 after 15
minutes) when the server fails to respond.
This can happen with SyncEvolution as server when it uses a slow
storage with many items, for example via WebDAV. In the case of slow
session startup, multithreading is now used to run the storage
initializing in parallel to sending regular "keep-alive" SyncML
replies to the client.
By default, these replies are sent every 2 minutes. This can be
configured with another extensions of the SyncMLVersion property:
SyncMLVersion = REQUESTMAXTIME=5m
Other modes do not use multithreading by default, but it can be
enabled by setting REQUESTMAXTIME explicitly. It can be disabled
by setting the time to zero.
The new feature depends on a libsynthesis with multithreading enabled
and glib >= 2.32.0, which is necessary to make SyncEvolution itself
thread-safe. With an older glib, multithreading is disabled, but can
be enabled as a stop-gap measure by setting REQUESTMAXTIME explicitly.
Only one thread may handle events in the default context at any point
in time. If a second thread calls g_main_context_iteration() or
g_main_loop_run(), it blocks until the other thread releases ownership
of the context. In that case, the first thread may wake up because of
an event that the second thread waits for, in which case the second
thread may never wake up. See
https://mail.gnome.org/archives/gtk-list/2013-April/msg00040.html
This means that SyncEvolution can no longer rely on these functions
outside of the main thread. This affects Sleep() and the EDS backend.
As an interim solution, take over permanent ownership of the default
context in the main thread. This prevents fights over the ownership
when the main thread enters and leaves the main loop
repeatedly. Utility code using the main context must check for
ownership first and fall back to some other means when not the owner.
The assumption for the fallback is that the main thread will drive the
event loop, so polling with small delays for the expected status
change (like "view complete" in the EDS backend) is going to succeed
eventually.
A better solution would be to have one thread running the event loop
permanently and push all event handling into that thread. There is C++
utility code for such things in:
http://cxx-gtk-utils.sourceforge.net/2.0/index.html
See in particular the TaskManager class and its
make_task_when()/make_task_compose()/make_task_when_full() functions
for executing asynchronous results via a glib main loop, also the
Future::when() method and a number of other similar things in the
library.
Refactored the code into a new utility base class for use in other
tests.
Replace pipes with temporary files, using the same base name as the
traditional .syncevo.log and .dbus.log. They are numbered (because the
command line might be run multiple times per test) and use .out, .err,
or .outerr as suffix depending on what kind of output they
contain. The advantage is that the output gets recorded permanently.
Use that when waiting for command completion times out: in that case,
the content of the output file(s) gets added to the exception.
The subprocess handle returned by startCmdline() is extended with
information about output redirection that is then used by
finishCmdline(). This makes one parameter of finishCmdline()
redundant.
It means "the remote was initiated", not "we were initiated by the
remote side". Somewhat unfortunate choice of a variable name, but
leave it for now...
Use the monotonic system time. Timespec has sub-second resolution and
by using the monotonic time, we are independent of time corrections.
If retryDuration is smaller than retryInterval, then time out after
retryDuration instead of waiting once for retryInterval and then
checking retryDuration.
When the request timed out *exactly* after retryDuration or the next
resend would happen after the retryDuration, then give up immediately.
When the helper shuts down normally after the parent is done with it,
there was a race between handling the signal and the loss of connection,
leading to random "return 1" errors. This showed up in nightly testing
in particular in the test-dbus.py testConfigure.
The loss of connection at that point is not an error, so don't treat it
as one and simply return 0.
During normal operation, the helper's messages were never printed to any
output stream. They only went out via D-Bus. This was surprising when
debugging syncevo-dbus-server and it's interaction with syncevo-dbus-helper.
It's better to let the parent print the helper's output if the helper
doesn't do it itself, so that console output is the same with and without
SYNCEVOLUTION_DEBUG.
When a sync in syncevo-dbus-helper was aborted, the process did not
wait in main() for the server to acknowledge the D-Bus method
response. If the process quit to early, the parent did not get the
method response, leading to an "internal error" instead of "aborted"
as result of the sync.
The reason is that the acknowledgement was meant to be done via
SIGTERM, but that had already been sent and thus syncevo-dbus-helper
did not wait. Solved by using SIGURG for the acknowledgement.
In addition, the parent must not close its connection when sending
signals to the child, because the child may still need to send its
sync report and/or log output signals.
D-Bus objects in a process exist independently from a specific D-Bus
connection. They are only identified by their path. When syncevo-dbus-server
forked multiple syncevo-dbus-helper instances, it indirectly (in ForkExec)
created multiple callback objects for the childs "watch" functionality (a
pending method call that the parent never replies to). These method calls were
associated with the a random (oldest?) session instead of the current one.
This causes problems once an older session terminates and the callback object
destructs, because then the wrong children get a failure response, which they
treat as "connection lost = parent has terminated". Found while trying to fix
shutdown race conditions.
The solution is to generate a unique counter for each child and communicate
that counter to the child via a new env variable.
It can be useful to use additional signals like SIGURG or SIGIO
(typically not used elsewhere) for simple interprocess
communication. SuspendFlags now supports that by letting the process
catch more than just SIGINT and SIGTERM and recording which signals
were received.
activate() calls can happen when already active, for example when
syncevo-dbus-helper instantiates SyncContext, which activates signal
handling for the duration of the sync. This was not handled well
before, leaking previously allocated FDs and not restoring signal
handlers correctly. Because the process quit anyway soon, this did not
matter in practice.
Now the code explicitly checks for this and returns the existing Guard
in all following activate() calls, without changing the signal
handling.
Logging must be thread-safe, because the glib log callback may be
called from arbitrary threads. This becomes more important with EDS
3.8, because it shifts the execution of synchronous calls into
threads.
Thread-safe logging will also be required for running the Synthesis
engine multithreaded, to overlap SyncML client communication with
preparing the sources.
To achieve this, the core Logging module protects its global data with
a recursive mutex. A recursive mutes is used because logging calls
themselves may be recursive, so ensuring single-lock semantic would be
hard.
Ref-counted boost pointers are used to track usage of Logger
instances. This allows removal of an instance from the logging stack
while it may still be in use. Destruction then will be delayed until
the last user of the instance drops it. The instance itself must be
prepared to handle this.
The Logging mutex is available to users of the Logging module. Code
which holds the logging mutex should not lock any other mutex, to
avoid deadlocks. The new code is a bit fuzzy on that, because it calls
other modules (glib, Synthesis engine) while holding the mutex. If
that becomes a problem, then the mutex can be unlocked, at the risk of
leading to reordered log messages in different channels (see
ServerLogger).
Making all loggers follow the new rules uses different
approaches.
Loggers like the one in the local transport child which use a parent
logger and an additional ref-counted class like the D-Bus helper keep
a weak reference to the helper and lock it before use. If it is gone
already, the second logging part is skipped. This is the recommended
approach.
In cases where introducing ref-counting for the second class would
have been too intrusive (Server and SessionHelper), a fake
boost::shared_ptr without a destructor is used as an intermediate step
towards the recommended approach. To avoid race conditions while the
instance these fake pointers refer to destructs, an explicit
"remove()" method is necessary which must hold the Logging
mutex. Using the potentially removed pointer must do the same. Such
fake ref-counted Loggers cannot be used as parent logger of other
loggers, because then remove() would not be able to drop the last
reference to the fake boost::shared_ptr.
Loggers with fake boost::shared_ptr must keep a strong reference,
because no-one else does. The goal is to turn this into weak
references eventually.
LogDir must protect concurrent access to m_report and the Synthesis
engine.
The LogRedirectLogger assumes that it is still the active logger while
disabling itself. The remove() callback method will always be invoked
before removing a logger from the stack.