Elapsed time exceeded
I have had 23 wu's error out for exceeding the elapsed time limit. These began occurring on 20110809. Others are likely to follow as I have a considerable number of wu's in progress. Any educated guesses?
|ID: 75 | Rating: 0 | rate: /|
The computational size parameters (rsc_fpops_est, rsc_fpops_bound) are
expressed in terms of number of floating-point operations. For example, suppose
a job takes 1 hour to complete on a machine with a Whetstone benchmark of 1
GFLOPS; then the "size" of J is 3.6e11 FLOPs. To get an initial estimate of job
size, run several typical jobs on your own computer, see how long they take,
and multiply by the Whetstone score of the computer (to find this, run BOINC on
the computer and look at the event log).
We get a computational size estimation of ~1.05e12 FLOPs (3.835e11 x 2.74).
All jobs are send with the following computational size estimations:
This should have been enough to calculate the jobs, to find out why it didn't i
searched the web and came across the following post (although it's dated
from 2009 so wouldn't know if it still applies):
The rsc_fpops_bound value is divided by the host's fpops/second value in
order to set the actual max_elapsed_time value. The host's fpops/second is
either the Whetstone benchmark or a <flops> value from an app_info.xml file.
The Whetstone benchmark shown as "Measured floating point speed" on the host
page looks reasonable (though that doesn't guarantee it was reasonable on the
host at the times the errors occurred). Is it possible your app_info.xml has a
<flops> value for the 6.03 app which has a couple extra zeroes in the value?
That's my best guess why BOINC is thinking that less than an hour is too much
The cause of these errors could be because of this <flops> value or that the
"Measured floating point speed" during the run was different. Comparing the cpu
times with the elapsed times for the failed jobs would suggest the latter.
Erasmus Grid Office
|ID: 76 | Rating: 0 | rate: /|
I'm not quite clear on this, so I looked at one of my own systems.
|ID: 79 | Rating: 0 | rate: /|
I followed the lines of thinking suggested by both Honto ni and Krunchin Keith and found that the error problem was not successfully resolved nor its source confirmed. All units generating the errors had OS xp 64-bit and boinc 6.10.58. Looking at similar active computers of zombie67,which were generating next to no errors, the only difference seemed to be I am heavy on AMD cpus. As a result, I directed all the affected machines to other projects, testing a few wu's now and then.
|ID: 222 | Rating: 0 | rate: /|
Yes, but the problem you refer to is an app specific setting, this could happen on any project to any app, so what goes on at another project is irrelavant here.
|ID: 224 | Rating: 0 | rate: /|