Page 1 of 1

nrn_timeout error

Posted: Wed Mar 16, 2011 1:35 pm
by jackfolla
Dear all,
during my runs I have the nrn_timeout error.

Code: Select all

nrn_timeout t=6525.15
[gozer3:21108] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0
mpiexec noticed that job rank 1 with PID 21109 on node gozer3 exited on signal 15 (Terminated). 
6 additional processes aborted (not shown)
I tried to modify in /src/nrniv/netpar.cpp
nrn_timeout(20) in nrn_timeout(500).

With nrn_timeout(20) the error occurred at t=3276.75 ms.

Code: Select all

nrn_timeout t=3276.75
[gozer3:21108] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0
mpiexec noticed that job rank 1 with PID 21109 on node gozer3 exited on signal 15 (Terminated). 
6 additional processes aborted (not shown)
I tried also with nrn_timeout(1000), but the following error is occurred:

Code: Select all

mpiexec noticed that job rank 1 with PID 18859 on node gozer3 exited on signal 15 (Terminated). 
6 additional processes aborted (not shown)
1 process killed (possibly by Open MPI)
The runs was performed on a 8-core machine (2 quad-core).

I see that the problem depends by a procedure in particular (if I comment this proc, the problem do not occours):

Code: Select all

proc a_record() {local j
	rec_time = new Vector()
	listrec_a = new List()
	rec_time.record(&t)
 	for (j=1; j<ncslist.count;j=j+2) {	// loop over possible target cells
		rec_a = new Vector()
		rec_a.record(&ncslist.o(j).weight[1])
		listrec_a.append(rec_a)
	}
}
Maybe the data amount collected is very high...

Re: nrn_timeout error

Posted: Sun Mar 20, 2011 11:24 am
by hines
timeout is off if you set it to 0.

Re: nrn_timeout error

Posted: Sun Mar 20, 2011 11:31 am
by hines
Assuming that dt=.025 then the problem is occurring when the Vectors reach a size of
oc>6525.15/dt
261006
oc>3276.75/dt
131070
I don't know how many NetCons are involved. It sounds like the time for copying vectors
is taking the time when they run out of memory and twice the memory is reallocated.
Clearly, recording all the weights every time step is not very space efficient.