melody359 2021-04-22 15:34 采纳率: 0%
浏览 71

r parallel::makeCluster 在进行并行时无限期挂起

R3.6版本

windows10系统

在使用R进行并行时无限期挂起,代码如下

cl <- parallel::makeCluster(1)

使用future包观察建立并行的过程

cl <- future::makeClusterPSOCK(1,outfile = NULL, verbose = TRUE)
输出结果如下
[local output] Workers: [n = 1] ‘localhost’
[local output] Base port: 11025
[local output] Creating node 1 of 1 ...
[local output] - setting up node
[local output] - attempt #1 of 3
Testing if worker's PID can be inferred: ‘"D:/Program Files/R-3.6.3/bin/x64/Rscript" -e "try(suppressWarnings(cat(Sys.getpid(),file=\"C:/Users/lenovo/AppData/Local/Temp/RtmpIjTjff/worker.rank=1.parallelly.parent=10668.29ac84217b7.pid\")), silent = TRUE)" -e "file.exists(\"C:/Users/lenovo/AppData/Local/Temp/RtmpIjTjff/worker.rank=1.parallelly.parent=10668.29ac84217b7.pid\")"’
- Possible to infer worker's PID: FALSE
[local output] Starting worker #1 on ‘localhost’: "D:/Program Files/R-3.6.3/bin/x64/Rscript" --default-packages=datasets,utils,grDevices,graphics,stats,methods -e "try(suppressWarnings(cat(Sys.getpid(),file=\"C:/Users/lenovo/AppData/Local/Temp/RtmpIjTjff/worker.rank=1.parallelly.parent=10668.29ac84217b7.pid\")), silent = TRUE)" -e "workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e) parallel:::.workRSOCK); workRSOCK()" MASTER=localhost PORT=11025 OUT= TIMEOUT=2592000 XDR=FALSE
[local output] - Exit code of system() call: 0
[local output] Waiting for worker #1 on ‘localhost’ to connect back
[local output] - Detected 'outfile=NULL' on Windows: this will make the output from the background worker visible when running R from a terminal, but it will most likely not be visible when using a GUI.
Failed to launch and connect to R worker on local machine ‘localhost’ from local machine ‘DESKTOP-GOVRM84’.
 * The error produced by socketConnection() was: ‘reached elapsed time limit’ (which suggests that the connection timeout of 120 seconds (argument 'connectTimeout') kicked in)
 * The localhost socket connection that failed to connect to the R worker used port 11025 using a communication timeout of 2592000 seconds and a connection timeout of 120 seconds.
 * Worker launch call: "D:/Program Files/R-3.6.3/bin/x64/Rscript" --default-packages=datasets,utils,grDevices,graphics,stats,methods -e "try(suppressWarnings(cat(Sys.getpid(),file=\"C:/Users/lenovo/AppData/Local/Temp/RtmpIjTjff/worker.rank=1.parallelly.parent=10668.29ac84217b7.pid\")), silent = TRUE)" -e "workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e) parallel:::.workRSOCK); workRSOCK()" MASTER=localhost PORT=11025 OUT= TIMEOUT=2592000 XDR=FALSE.
 * Failed to kill local worker because it's PID is could not be identified.
 * Troubleshooting suggestions:
   - Suggestion #1: On Windows, to see output from worker, set 'outfile=NULL' and run R from a terminal (not a GUI).

[local output] - waiting 15 seconds before trying again
[local output] - attempt #2 of 3
Testing if worker's PID can be inferred: ‘"D:/Program Files/R-3.6.3/bin/x64/Rscript" -e "try(suppressWarnings(cat(Sys.getpid(),file=\"C:/Users/lenovo/AppData/Local/Temp/RtmpIjTjff/worker.rank=1.parallelly.parent=10668.29ac4e5aba6.pid\")), silent = TRUE)" -e "file.exists(\"C:/Users/lenovo/AppData/Local/Temp/RtmpIjTjff/worker.rank=1.parallelly.parent=10668.29ac4e5aba6.pid\")"’
- Possible to infer worker's PID: FALSE
[local output] Starting worker #1 on ‘localhost’: "D:/Program Files/R-3.6.3/bin/x64/Rscript" --default-packages=datasets,utils,grDevices,graphics,stats,methods -e "try(suppressWarnings(cat(Sys.getpid(),file=\"C:/Users/lenovo/AppData/Local/Temp/RtmpIjTjff/worker.rank=1.parallelly.parent=10668.29ac4e5aba6.pid\")), silent = TRUE)" -e "workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e) parallel:::.workRSOCK); workRSOCK()" MASTER=localhost PORT=11025 OUT= TIMEOUT=2592000 XDR=FALSE
[local output] - Exit code of system() call: 0
[local output] Waiting for worker #1 on ‘localhost’ to connect back

发现Waiting for worker #1 on ‘localhost’ to connect back会停留在该步骤,也就是说说无法与本地的localhost建立连接,本地对该请求没有反应,奇怪的是在一天前还是可以正常使用的,可以确定不是内存不足的原因

  • 写回答

1条回答 默认 最新

  • 码农阿豪@新空间 Java领域优质创作者 2024-07-14 22:45
    关注
    让阿豪来帮你解答,本回答参考chatgpt3.5编写提供,如果还有疑问可以评论或留言
    1; TRUE)" -e "workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e) parallel:::.workRSOCK); workRSOCK()" MASTER=localhost PORT=11025 OUT= TIMEOUT=2592000 XDR=FALSE[local output] - Exit code of system() call: 0[local output] Waiting for worker #1 on ‘localhost’ to connect back[local output] - Detected 'outfile=NULL' on Windows: this will make the output from the background worker visible when running R from a terminal, but it will most likely not be visible when using a GUI.Failed to launch and connect to R worker on local machine ‘localhost’ from local machine ‘DESKTOP-GOVRM84’. * The error produced by socketConnection() was: ‘reached elapsed time limit’ (which suggests that the connection timeout of 120 seconds (argument 'connectTimeout') kicked in) * The localhost socket connection that failed to connect to the R worker used port 11025 using a communication timeout of 2592000 seconds and a connection timeout of 120 seconds. * Worker launch call: "D:/Program Files/R-3.6.3/bin/x64/Rscript" --default-packages=datasets,utils,grDevices,graphics,stats,methods -e "try(suppressWarnings(cat(Sys.getpid(),file=\"C:/Users/lenovo/AppData/Local/Temp/RtmpIjTjff/worker.rank=1.parallelly.parent=10668.29ac4e5aba6.pid\")), silent = TRUE)" -e "workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e) parallel:::.workRSOCK); workRSOCK()" MASTER=localhost PORT=11025 OUT= TIMEOUT=2592000 XDR=FALSE. * Failed to kill local worker because it's PID is could not be identified. * Troubleshooting suggestions: - Suggestion #1: On Windows, to see output from worker, set 'outfile=NULL' and run R from a terminal (not a GUI).[local output] - waiting 15 seconds before trying again[local output] - attempt #3 of 3Failed to launch and connect to R worker on local machine ‘localhost’ from local machine ‘DESKTOP-GOVRM84’. * The error produced by socketConnection() was: ‘reached elapsed time limit’ (which suggests that the connection timeout of 120 seconds (argument 'connectTimeout') kicked in) * The localhost socket connection that failed to connect to the R worker used port 11025 using a communication timeout of 2592000 seconds and a connection timeout of 120 seconds. * Worker launch call: "D:/Program Files/R-3.6.3/bin/x64/Rscript" --default-packages=datasets,utils,grDevices,graphics,stats,methods -e "try(suppressWarnings(cat(Sys.getpid(),file=\"C:/Users/lenovo/AppData/Local/Temp/RtmpIjTjff/worker.rank=1.parallelly.parent=10668.29ac4e5aba6.pid\")), silent = TRUE)" -e "workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e) parallel:::.workRSOCK); workRSOCK()" MASTER=localhost PORT=11025 OUT= TIMEOUT=2592000 XDR=FALSE. * Failed to kill local worker because it's PID is could not be identified. * Troubleshooting suggestions: - Suggestion #1: On Windows, to see output from worker, set 'outfile=NULL' and run R from a terminal (not a GUI).[local output] - shutting down the worker node[local output] Inspecting the cluster later may seem like everything was fine because the worker file / disk connection could not be tested on Windows. But most likely the workers were not launched correctly and are _not_ working properly.[local output] How to do that: r :: parallelEndNode() or future :: stopCluster().根据输出结果,系统因为使用PSOCK进行并行连接超时,可以尝试使用其他并行方式解决。更换并行方式为multicore,代码如下cl <- future::makeClusterMulticore(1,outfile = NULL, verbose = TRUE)利用新的并行方式进行计算后,系统不再出现连接超时问题,成功建立并行连接。
    评论

报告相同问题?