Could not wait for server fd - select (11: Resource temporarily unavailable) [IP: 138.201.228.45 443]
Description
Related Objects
Event Timeline
Similar issue:
Get:393 https://repo.pureos.net/pureos byzantium/main amd64 xvfb amd64 2:1.20.10-2 [3036 kB]
Fetched 182 MB in 5s (40.4 MB/s)
E: Failed to fetch https://repo.pureos.net/pureos/pool/main/l/llvm-toolchain-11/libllvm11_11.0.1-2_amd64.deb Error reading from server - read (5: Input/output error) [IP: 138.201.228.45 443]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
E: Failed to process build dependencies
Similar:
Get:289 https://repo.pureos.net/pureos byzantium/main arm64 xdg-user-dirs arm64 0.17-2 [53.2 kB]
E: Failed to fetch https://repo.pureos.net/pureos/pool/main/c/chardet/python3-chardet_4.0.0-1_all.deb Error reading from server - read (5: Input/output error) [IP: 138.201.228.45 443]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
Fetched 94.6 MB in 9s (10.1 MB/s)
nginx error log on Artemis is saying:
2021/02/11 00:36:48 [error] 503#503: *9986273 directory index of "/srv/repo.puri.sm/" is forbidden, client: 138.201.228.45, server: artemis.pureos.net, request: "POST / HTTP/1.0", host: "artemis.pureos.net"
2021/02/11 01:17:27 [error] 503#503: *9990896 open() "/srv/repo.puri.sm/.well-known/security.txt" failed (2: No such file or directory), client: 138.201.228.45, server: artemis.pureos.net, request: "GET /.well-known/security.txt HTTP/1.0", host: "artemis.pureos.net"
2021/02/11 09:31:20 [error] 503#503: *10043922 directory index of "/srv/repo.puri.sm/" is forbidden, client: 138.201.228.45, server: artemis.pureos.net, request: "POST / HTTP/1.0", host: "artemis.pureos.net"
2021/02/11 10:46:11 [error] 503#503: *10049424 directory index of "/srv/repo.puri.sm/" is forbidden, client: 138.201.228.45, server: artemis.pureos.net, request: "POST / HTTP/1.0", host: "artemis.pureos.net"
to add some context:
we're seeing image build and CI failures when downloading packages. @mak also saw this when generating isos. Besides the above the client side also sometimes says:
Fetched 164 MB in 14s (11.4 MB/s) E: Failed to fetch https://repo.pureos.net/pureos/pool/main/libx/libxcb/libxcb-shm0_1.14-3_arm64.deb Undetermined Error [IP: 138.201.228.45 443] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? E: Failed to process build dependencies
Switching to http:// reliably works around the problem
maybe relevant to your research:
curl -O https://repo.pureos.net/pureos/dists/amber/main/source/Sources.xz works fine,
but curl -O --cert-status https://repo.pureos.net/pureos/dists/amber/main/source/Sources.xz fails:
curl: (91) No OCSP response received
Similar for https://deb.debian.org/ succeeds.
I..e might be this issue can be narrowed down to OCSP stapling on our server.
https://stackoverflow.com/a/60243923 mentions how to disable OCSP for apt:
touch /etc/apt/apt.conf.d/99verify-peer.conf \ && echo >>/etc/apt/apt.conf.d/99verify-peer.conf "Acquire { https::Verify-Peer false }"
I tested this with https::Verify-Peer false, still the same issue happens:
Fetched 1016 MB in 4min 12s (4025 kB/s) 2021/03/04 22:30:39 apt | E: Failed to fetch https://repo.pureos.net/pureos/pool/main/f/fftw3/libfftw3-double3_3.3.8-2_amd64.deb Connection timed out [IP: 138.201.228.45 443] 2021/03/04 22:30:39 apt | E: Failed to fetch https://repo.pureos.net/pureos/pool/main/s/spice-gtk/libspice-client-glib-2.0-8_0.39-1_amd64.deb Connection timed out [IP: 138.201.228.45 443] 2021/03/04 22:30:39 apt | E: Failed to fetch https://repo.pureos.net/pureos/pool/main/libs/libsodium/libsodium23_1.0.18-1_amd64.deb Connection timed out [IP: 138.201.228.45 443] 2021/03/04 22:30:39 apt | E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
I pasted the wrong log, but the connection timeout is actually even more frequent now than the Resource temporarily unavailable issue - but both appear.
I also tried messing with timeouts on APTs transport methods, with no luck - according to APT, the server just stops responding (according to curl though, it doesn't).
Benchmark:
Server Software: nginx/1.10.3 Server Hostname: repo.pureos.net Server Port: 443 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128 Server Temp Key: X25519 253 bits TLS Server Name: repo.pureos.net Document Path: / Document Length: 381 bytes Concurrency Level: 5 Time taken for tests: 91.208 seconds Complete requests: 1000 Failed requests: 0 Total transferred: 518000 bytes HTML transferred: 381000 bytes Requests per second: 10.96 [#/sec] (mean) Time per request: 456.042 [ms] (mean) Time per request: 91.208 [ms] (mean, across all concurrent requests) Transfer rate: 5.55 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 285 339 164.2 300 1784 Processing: 92 114 71.8 96 641 Waiting: 91 102 45.8 95 637 Total: 379 452 179.8 398 1886 Percentage of the requests served within a certain time (ms) 50% 398 66% 402 75% 410 80% 427 90% 547 95% 733 98% 1060 99% 1489 100% 1886 (longest request)
Certificate version: 3 Valid from: Oct 7 19:21:40 2020 GMT Valid to : Sep 29 19:21:40 2021 GMT Public key is 2048 bits The issuer name is /O=Digital Signature Trust Co./CN=DST Root CA X3 The subject name is /C=US/O=Let's Encrypt/CN=R3 Extension Count: 8 Peer certificate Certificate version: 3 Valid from: Feb 27 18:12:29 2021 GMT Valid to : May 28 18:12:29 2021 GMT Public key is 2048 bits The issuer name is /C=US/O=Let's Encrypt/CN=R3 The subject name is /CN=downloads.pureos.net Extension Count: 9 Transport Protocol :TLSv1.2 Cipher Suite Protocol :TLSv1.2 Cipher Suite Name :ECDHE-RSA-AES128-GCM-SHA256 Cipher Suite Cipher Bits:128 (128) SSL-Session: Protocol : TLSv1.2 Cipher : ECDHE-RSA-AES128-GCM-SHA256 Session-ID: 39DB1E294804DA2D5AB727DE4CF12062B4FA46A36F9DFA278CD675B3535CE0FD Session-ID-ctx: Master-Key: BCF95A63D726D1B9685B5293C6212D1CBD8620E94904D9D3A4CA8B6A9EAA6CF5976F668441B9F8F4DF24A70F457C5422 PSK identity: None PSK identity hint: None SRP username: None TLS session ticket lifetime hint: 86400 (seconds) TLS session ticket: 0000 - b1 a5 92 f4 25 9b 67 fc-d5 c9 5e 0b 0d ba e7 5e ....%.g...^....^ 0010 - 66 2e d9 f2 68 3a 4f e9-3e 00 9d 33 7b e2 66 49 f...h:O.>..3{.fI 0020 - ff 93 f6 af 6a a0 64 7b-84 eb fc 07 f1 bf 10 ba ....j.d{........ 0030 - 48 55 66 ca 4a 9e 44 de-3b 5e 7b f9 e0 e9 23 6a HUf.J.D.;^{...#j 0040 - 88 6f 52 da 28 43 c3 92-2b 9a da f7 d4 f1 3b 9c .oR.(C..+.....;. 0050 - 2e 6f 9c a3 71 78 cf f2-4d e6 b1 62 16 87 c3 01 .o..qx..M..b.... 0060 - 58 7d b4 9f 89 e2 e2 98-39 71 3b bd 05 06 5d 22 X}......9q;...]" 0070 - 0e b6 fc 17 2c 86 08 13-3c e3 65 24 a3 7b 45 9a ....,...<.e$.{E. 0080 - 31 10 70 30 1e d7 64 92-09 b4 10 bf 09 e9 be 10 1.p0..d......... 0090 - 18 56 32 e6 60 bf 0f 24-10 ae df 8f 48 b9 8f 48 .V2.`..$....H..H 00a0 - 1c e3 fa bc 2b a7 d2 52-da 1f cf 28 d1 01 cd 95 ....+..R...(.... 00b0 - 91 6b c6 b2 9d 60 96 a1-24 51 18 92 19 c9 ab 3b .k...`..$Q.....; Start Time: 1615226134 Timeout : 7200 (sec) Verify return code: 20 (unable to get local issuer certificate) Extended master secret: yes LOG: header received: HTTP/1.1 200 OK Server: nginx/1.10.3 Date: Mon, 08 Mar 2021 17:55:34 GMT Content-Type: text/html; charset=utf-8 Connection: close <html> <head><title>Index of /</title></head> <body bgcolor="white"> <h1>Index of /</h1><hr><pre><a href="../">../</a> <a href="pureos/">pureos/</a> 13-Sep-2019 21:41 - <a href="pureos-debug/">pureos-debug/</a> 14-Sep-2019 13:10 - </pre><hr></body> </html> LOG: Response code = 200 SSL/TLS Alert [read] warning:close notify Completed 1000 requests SSL/TLS Alert [write] warning:close notify Finished 1000 requests Server Software: nginx/1.10.3 Server Hostname: repo.puri.sm Server Port: 443 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128 Server Temp Key: X25519 253 bits TLS Server Name: repo.puri.sm Document Path: / Document Length: 381 bytes Concurrency Level: 1 Time taken for tests: 442.822 seconds Complete requests: 1000 Failed requests: 0 Keep-Alive requests: 0 Total transferred: 518000 bytes HTML transferred: 381000 bytes Requests per second: 2.26 [#/sec] (mean) Time per request: 442.822 [ms] (mean) Time per request: 442.822 [ms] (mean, across all concurrent requests) Transfer rate: 1.14 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 285 335 153.0 299 1400 Processing: 92 107 57.6 97 526 Waiting: 91 101 37.1 96 396 Total: 378 443 165.2 396 1781 Percentage of the requests served within a certain time (ms) 50% 396 66% 399 75% 401 80% 403 90% 680 95% 698 98% 984 99% 1485 100% 1781 (longest request)
What exactly was tested 1000 times? some apt tool using an up-to-date libapt, or something more low-level?
If using a different method than actual apt, then is it certain that the level of TLS validation is equal or stronger than that done by up-to-date libapt?
I notice the following in above data dump (which I guess is a sample output similar to those 1000 calls:
Verify return code: 20 (unable to get local issuer certificate)
That looks like a potential anomaly to me...
Some new observations:
- This is not a proxy server issue: Even without proxy, the issue occurs
- The TLS version doesn't matter at all
- Before the issue occurs, we get quite a few TCP retransmissions from the client to the server, and then the current connection is dropped:
- There is nothing suspicious in the Nginx logs, not even at info priority. A quick glance at the debug logs also didn't show anything interesting, but those are massive and it's possible that I missed something.
There's also a TCP reset sent from the server to the client, but I'm not sure if that's actually related to the issue - it is suspicious that this always happens before the connection dies.
So, I am pretty sure now that this is something weird on our side, the client appears to behave fine. Nginx however also appears to behave as it should, so I wonder whether we have something else in the network or an odd firewall setting which causes this.
So, that RST packet is indeed the issue, and there's a high chance that either APT or GnuTLS don't handle this correctly. I talked with an APT developer, and we may actually need to debug this further in future.
In the meanwhile though, the issue can be mitigated by throwing an Apache2 webserver in front as proxy, instead of Nginx.
I implemented that in our infrastructure, so the issue is gone for us now though. Therefore I am marking this issue as resolved, but we may actually need to open a new one against either Nginx or, more likely, APT in future to properly fix this.
Hello,
the problem still persists on my Librem 5. I cannot download updates for tzdata and certificates