| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
| |
Even if we didn't switch because we already use the requested server.
|
|
|
|
|
|
|
| |
If we switch to a different server when we only have something in
the send list but nothing in the recv list, the send worker would
not have gotten invoked. Now we unconditionally trigger the send
worker when asked to re-queue any pending requests.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using workqueues frees us from having to manage the lifecycle
of three dedicated threads. Discovery (alt server checks) and
sending keepalive packets is now done using work on the
power efficient system queue. Sending and receiving happens
via dedicated work queues with higher priority.
blk-mq has also been around for quite a while in the kernel,
so switching to it doesn't hurt backwards compatibility.
As the code is now refactored to work more as blk-mq is designed,
backwards compatibility even improved while at the same time
freeing us from an arsenal of macros that were required to make
the blk-mq port look and feel like the old implementation.
For example, the code now compiles on CentOS 7 with kernel 3.10
without requiring special macros to detect the heavily modified
RedHat kernel with all its backported features.
A few other design limitations have been rectified along the way,
e.g. switching to another server now doesn't internally disconnect
from the current one first, which theoretically could lead to a
non-working setup, if the new server isn't reachable and then -
because of some transient network error - switching back also
fails. As the discover-thread was torn down from the disconnect
call, the connection would also not repair itself eventually.
we now establish the new connection in parallel to the old one,
and only if that succeeds do we replace the old one with it,
similar to how the automatic alt-server switch already does it.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
This spams scary red errors to dmesg when really an unreachable alt
server isn't that much of a deal during normal operation. Change the
log level to debug instead.
Might even consider not printing anything at all.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Any request from a client being relayed to an uplink server will have
its size extended to this value. It will also be applied to background
replication requests, if the BGR mode is FULL.
As request coalescing is currently very primitive, this setting should
usually be left diabled, and bgrWindowSize used instead, if appropriate.
If you enable this, set it to something large (1M+), or it might have
adverse effects.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
When establishing a new connection on a disconnected device, the old
list of alt-servers was retained. This would lead to us connecting to
the wrong server, as the number of newly passed servers was used when
looping over the list of alt-servers to actually connect.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Incoming requests from clients might actually be prefetch jobs from
another downstream proxy. Don't do prefetching for those, as this would
cascade upwars in the proxy chain (prefetch for a prefetch of a prefetch)
Incoming requests might also be background replication. Don't relay
those if we're not configured for background replication as well.
|
|
|
|
|
|
|
| |
This will send all (block) requests immediately at sometimes more
overhead, but slighly less delays. Since the outgoing connection on a
client is only used very lightly, this tradeoff should always make
sense.
|
|
|
|
|
|
|
|
|
| |
There is a race condition where we process the next request from the
same client faster than the OS will schedule the async prefetch job,
rendering it a NOOP in the best case (request ranges match) or fetching
redundant data from the upstream server (prefetch range is larger than
actual request by client). Make prefetching synchronous to prevent this
race condition.
|
|
|
|
|
|
| |
If you need daemon mode, run as root with --daemon, normal users can
then request devices to be connected using the same binary WITHOUT
havind the suid bit set on it.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
- Remove the ugly timeout hack that apparently isn't required after all.
- Set a few socket options that appear to make sense in out use case (no
linger, only one SYN retry, NODELAY).
- Adapt socket timeout in panic mode, in case we're on a very bad
connection.
|
|
|
|
|
|
| |
Remove superflous, reduntant or otherwise useless
information. Use space as separator instead of comma
for better readability and easier parsing in shell etc.
|
|
|
|
|
|
|
| |
Similar logic already exists in the fuse client:
Count how many times in a row a server was fastest when
measuring RTTs, and lower the switching threshold
more the higher the count gets.
|
|
|
|
|
|
|
|
| |
Convert dnbd3_host_t to struct sockaddr immediately when
adding alt servers, so we don't have to convert it every time
we establish a connection. Additionally we can now use %pISpc
in printf-like functions instead of having if/else constructs
whenever we want to print an address.
|
|
|
|
|
| |
This avoids automatically switching back right after adding
and switching to a server.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This change prevents the dnbd3-client to print the help text twice if
the help parameter is submitted. In addition to that, correct exit codes
are set after the help text is printed and the program terminates.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Formerly, the request that was about to be received was looked up in
the receive queue without removing it, then the request payload was
received from the socket while the lock was not being held, and finally,
the lock was required again and the request removed from the queue.
This is dangrous as another thread can concurrently take the request
from the queue while the receive thread reads the payload from the
socket, leading to a double-free by calling blk_mq_end_request twice.
|