Could not wait for server fd - select (11: Resource temporarily unavailable) [IP: 138.201.228.45 443]
Description
Related Objects
Event Timeline
Similar issue:
Get:393 https://repo.pureos.net/pureos byzantium/main amd64 xvfb amd64 2:1.20.10-2 [3036 kB]
Fetched 182 MB in 5s (40.4 MB/s)
E: Failed to fetch https://repo.pureos.net/pureos/pool/main/l/llvm-toolchain-11/libllvm11_11.0.1-2_amd64.deb Error reading from server - read (5: Input/output error) [IP: 138.201.228.45 443]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
E: Failed to process build dependencies
Similar:
Get:289 https://repo.pureos.net/pureos byzantium/main arm64 xdg-user-dirs arm64 0.17-2 [53.2 kB]
E: Failed to fetch https://repo.pureos.net/pureos/pool/main/c/chardet/python3-chardet_4.0.0-1_all.deb Error reading from server - read (5: Input/output error) [IP: 138.201.228.45 443]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
Fetched 94.6 MB in 9s (10.1 MB/s)
nginx error log on Artemis is saying:
2021/02/11 00:36:48 [error] 503#503: *9986273 directory index of "/srv/repo.puri.sm/" is forbidden, client: 138.201.228.45, server: artemis.pureos.net, request: "POST / HTTP/1.0", host: "artemis.pureos.net"
2021/02/11 01:17:27 [error] 503#503: *9990896 open() "/srv/repo.puri.sm/.well-known/security.txt" failed (2: No such file or directory), client: 138.201.228.45, server: artemis.pureos.net, request: "GET /.well-known/security.txt HTTP/1.0", host: "artemis.pureos.net"
2021/02/11 09:31:20 [error] 503#503: *10043922 directory index of "/srv/repo.puri.sm/" is forbidden, client: 138.201.228.45, server: artemis.pureos.net, request: "POST / HTTP/1.0", host: "artemis.pureos.net"
2021/02/11 10:46:11 [error] 503#503: *10049424 directory index of "/srv/repo.puri.sm/" is forbidden, client: 138.201.228.45, server: artemis.pureos.net, request: "POST / HTTP/1.0", host: "artemis.pureos.net"
to add some context:
we're seeing image build and CI failures when downloading packages. @mak also saw this when generating isos. Besides the above the client side also sometimes says:
Fetched 164 MB in 14s (11.4 MB/s) E: Failed to fetch https://repo.pureos.net/pureos/pool/main/libx/libxcb/libxcb-shm0_1.14-3_arm64.deb Undetermined Error [IP: 138.201.228.45 443] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? E: Failed to process build dependencies
Switching to http:// reliably works around the problem
maybe relevant to your research:
curl -O https://repo.pureos.net/pureos/dists/amber/main/source/Sources.xz works fine,
but curl -O --cert-status https://repo.pureos.net/pureos/dists/amber/main/source/Sources.xz fails:
curl: (91) No OCSP response received
Similar for https://deb.debian.org/ succeeds.
I..e might be this issue can be narrowed down to OCSP stapling on our server.
https://stackoverflow.com/a/60243923 mentions how to disable OCSP for apt:
touch /etc/apt/apt.conf.d/99verify-peer.conf \
&& echo >>/etc/apt/apt.conf.d/99verify-peer.conf "Acquire { https::Verify-Peer false }"I tested this with https::Verify-Peer false, still the same issue happens:
Fetched 1016 MB in 4min 12s (4025 kB/s) 2021/03/04 22:30:39 apt | E: Failed to fetch https://repo.pureos.net/pureos/pool/main/f/fftw3/libfftw3-double3_3.3.8-2_amd64.deb Connection timed out [IP: 138.201.228.45 443] 2021/03/04 22:30:39 apt | E: Failed to fetch https://repo.pureos.net/pureos/pool/main/s/spice-gtk/libspice-client-glib-2.0-8_0.39-1_amd64.deb Connection timed out [IP: 138.201.228.45 443] 2021/03/04 22:30:39 apt | E: Failed to fetch https://repo.pureos.net/pureos/pool/main/libs/libsodium/libsodium23_1.0.18-1_amd64.deb Connection timed out [IP: 138.201.228.45 443] 2021/03/04 22:30:39 apt | E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
I pasted the wrong log, but the connection timeout is actually even more frequent now than the Resource temporarily unavailable issue - but both appear.
I also tried messing with timeouts on APTs transport methods, with no luck - according to APT, the server just stops responding (according to curl though, it doesn't).
Benchmark:
Server Software: nginx/1.10.3
Server Hostname: repo.pureos.net
Server Port: 443
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
Server Temp Key: X25519 253 bits
TLS Server Name: repo.pureos.net
Document Path: /
Document Length: 381 bytes
Concurrency Level: 5
Time taken for tests: 91.208 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 518000 bytes
HTML transferred: 381000 bytes
Requests per second: 10.96 [#/sec] (mean)
Time per request: 456.042 [ms] (mean)
Time per request: 91.208 [ms] (mean, across all concurrent requests)
Transfer rate: 5.55 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 285 339 164.2 300 1784
Processing: 92 114 71.8 96 641
Waiting: 91 102 45.8 95 637
Total: 379 452 179.8 398 1886
Percentage of the requests served within a certain time (ms)
50% 398
66% 402
75% 410
80% 427
90% 547
95% 733
98% 1060
99% 1489
100% 1886 (longest request)Certificate version: 3
Valid from: Oct 7 19:21:40 2020 GMT
Valid to : Sep 29 19:21:40 2021 GMT
Public key is 2048 bits
The issuer name is /O=Digital Signature Trust Co./CN=DST Root CA X3
The subject name is /C=US/O=Let's Encrypt/CN=R3
Extension Count: 8
Peer certificate
Certificate version: 3
Valid from: Feb 27 18:12:29 2021 GMT
Valid to : May 28 18:12:29 2021 GMT
Public key is 2048 bits
The issuer name is /C=US/O=Let's Encrypt/CN=R3
The subject name is /CN=downloads.pureos.net
Extension Count: 9
Transport Protocol :TLSv1.2
Cipher Suite Protocol :TLSv1.2
Cipher Suite Name :ECDHE-RSA-AES128-GCM-SHA256
Cipher Suite Cipher Bits:128 (128)
SSL-Session:
Protocol : TLSv1.2
Cipher : ECDHE-RSA-AES128-GCM-SHA256
Session-ID: 39DB1E294804DA2D5AB727DE4CF12062B4FA46A36F9DFA278CD675B3535CE0FD
Session-ID-ctx:
Master-Key: BCF95A63D726D1B9685B5293C6212D1CBD8620E94904D9D3A4CA8B6A9EAA6CF5976F668441B9F8F4DF24A70F457C5422
PSK identity: None
PSK identity hint: None
SRP username: None
TLS session ticket lifetime hint: 86400 (seconds)
TLS session ticket:
0000 - b1 a5 92 f4 25 9b 67 fc-d5 c9 5e 0b 0d ba e7 5e ....%.g...^....^
0010 - 66 2e d9 f2 68 3a 4f e9-3e 00 9d 33 7b e2 66 49 f...h:O.>..3{.fI
0020 - ff 93 f6 af 6a a0 64 7b-84 eb fc 07 f1 bf 10 ba ....j.d{........
0030 - 48 55 66 ca 4a 9e 44 de-3b 5e 7b f9 e0 e9 23 6a HUf.J.D.;^{...#j
0040 - 88 6f 52 da 28 43 c3 92-2b 9a da f7 d4 f1 3b 9c .oR.(C..+.....;.
0050 - 2e 6f 9c a3 71 78 cf f2-4d e6 b1 62 16 87 c3 01 .o..qx..M..b....
0060 - 58 7d b4 9f 89 e2 e2 98-39 71 3b bd 05 06 5d 22 X}......9q;...]"
0070 - 0e b6 fc 17 2c 86 08 13-3c e3 65 24 a3 7b 45 9a ....,...<.e$.{E.
0080 - 31 10 70 30 1e d7 64 92-09 b4 10 bf 09 e9 be 10 1.p0..d.........
0090 - 18 56 32 e6 60 bf 0f 24-10 ae df 8f 48 b9 8f 48 .V2.`..$....H..H
00a0 - 1c e3 fa bc 2b a7 d2 52-da 1f cf 28 d1 01 cd 95 ....+..R...(....
00b0 - 91 6b c6 b2 9d 60 96 a1-24 51 18 92 19 c9 ab 3b .k...`..$Q.....;
Start Time: 1615226134
Timeout : 7200 (sec)
Verify return code: 20 (unable to get local issuer certificate)
Extended master secret: yes
LOG: header received:
HTTP/1.1 200 OK
Server: nginx/1.10.3
Date: Mon, 08 Mar 2021 17:55:34 GMT
Content-Type: text/html; charset=utf-8
Connection: close
<html>
<head><title>Index of /</title></head>
<body bgcolor="white">
<h1>Index of /</h1><hr><pre><a href="../">../</a>
<a href="pureos/">pureos/</a> 13-Sep-2019 21:41 -
<a href="pureos-debug/">pureos-debug/</a> 14-Sep-2019 13:10 -
</pre><hr></body>
</html>
LOG: Response code = 200
SSL/TLS Alert [read] warning:close notify
Completed 1000 requests
SSL/TLS Alert [write] warning:close notify
Finished 1000 requests
Server Software: nginx/1.10.3
Server Hostname: repo.puri.sm
Server Port: 443
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
Server Temp Key: X25519 253 bits
TLS Server Name: repo.puri.sm
Document Path: /
Document Length: 381 bytes
Concurrency Level: 1
Time taken for tests: 442.822 seconds
Complete requests: 1000
Failed requests: 0
Keep-Alive requests: 0
Total transferred: 518000 bytes
HTML transferred: 381000 bytes
Requests per second: 2.26 [#/sec] (mean)
Time per request: 442.822 [ms] (mean)
Time per request: 442.822 [ms] (mean, across all concurrent requests)
Transfer rate: 1.14 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 285 335 153.0 299 1400
Processing: 92 107 57.6 97 526
Waiting: 91 101 37.1 96 396
Total: 378 443 165.2 396 1781
Percentage of the requests served within a certain time (ms)
50% 396
66% 399
75% 401
80% 403
90% 680
95% 698
98% 984
99% 1485
100% 1781 (longest request)What exactly was tested 1000 times? some apt tool using an up-to-date libapt, or something more low-level?
If using a different method than actual apt, then is it certain that the level of TLS validation is equal or stronger than that done by up-to-date libapt?
I notice the following in above data dump (which I guess is a sample output similar to those 1000 calls:
Verify return code: 20 (unable to get local issuer certificate)
That looks like a potential anomaly to me...
Some new observations:
- This is not a proxy server issue: Even without proxy, the issue occurs
- The TLS version doesn't matter at all
- Before the issue occurs, we get quite a few TCP retransmissions from the client to the server, and then the current connection is dropped:
- There is nothing suspicious in the Nginx logs, not even at info priority. A quick glance at the debug logs also didn't show anything interesting, but those are massive and it's possible that I missed something.
There's also a TCP reset sent from the server to the client, but I'm not sure if that's actually related to the issue - it is suspicious that this always happens before the connection dies.
So, I am pretty sure now that this is something weird on our side, the client appears to behave fine. Nginx however also appears to behave as it should, so I wonder whether we have something else in the network or an odd firewall setting which causes this.
So, that RST packet is indeed the issue, and there's a high chance that either APT or GnuTLS don't handle this correctly. I talked with an APT developer, and we may actually need to debug this further in future.
In the meanwhile though, the issue can be mitigated by throwing an Apache2 webserver in front as proxy, instead of Nginx.
I implemented that in our infrastructure, so the issue is gone for us now though. Therefore I am marking this issue as resolved, but we may actually need to open a new one against either Nginx or, more likely, APT in future to properly fix this.
Hello,
the problem still persists on my Librem 5. I cannot download updates for tzdata and certificates