Page 1 of 1

Nrn timeout error

Posted: Fri Jan 18, 2013 3:03 am
by shyam_u2
I am getting this timeout error when I run my model.

nrn_timeout t=3
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 28500 on
node tombo11103 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

May I know whats causes this ?

Re: Nrn timeout error

Posted: Fri Jan 18, 2013 3:38 pm
by hines
It means that during a run, 20 seconds of wall time passed without t increasing.
This is present to avoid potentially wasting thousands of cpu hours on a supercomputer until the time limit is reached.
It could happen if there is a bug that causes an MPI collective to wait forever.
But sometimes it means you are stopping the sim and taking a long time to write data.
I don't know which it is in your case.
You can set your own timeout with
pc.timeout(x) and if x is 0 the timeout is off. (pc must be a ParallelContext instance)

Re: Nrn timeout error

Posted: Tue Jan 22, 2013 3:22 am
by shyam_u2
Let me explain the context of this error.
I am working with pc.source_var and pc.target_var for the purpose of making gap junctions working in mpi environment.
I added lines of code incrementally and compiled them. Till the point I insert the statement pc.setup_transfer() everything is fine (Model completes execution). But when I insert pc.setup_transfer, it hangs for sometime and finally throws up nrn_timeout error.
Do you have any idea whats going on here ?
pc.timeout(x) and if x is 0 the timeout is off. (pc must be a ParallelContext instance)
NEURON says timeout is not a public member of Parallel Context.

Re: Nrn timeout error

Posted: Tue Jan 22, 2013 4:02 pm
by hines
Your experience with setup_transfer may be due to a bug which has already been fixed.
The last change to that area of the code was 6 months ago.
ww.neuron.yale.edu/hg/neuron/nrn/rev/b60b3450eff6

Also ParallelContext.timeout was introduced
into the main trunk of the repository 10 months ago.
http://www.neuron.yale.edu/hg/neuron/nr ... c98139370e
So I think it makes sense for you to either build from the repository sources or else the tar.gz file at
http://www.neuron.yale.edu/ftp/neuron/versions/alpha/

If you continue to have problems with timeout in pc.setup_transfer, you can send me all the hoc,mod files in a zip file needed to reproduce
the problem and I can do some diagnosis.

Re: Nrn timeout error

Posted: Mon Jan 28, 2013 1:30 am
by shyam_u2
All the above things which I mentioned happens only when I decalare vgap as RANGE variable in gap.mod(This is gap junction mechanism adopted in NEURON book chapter 10.1.2). But when I change vgap to pointer variable as it is given in the NEURON book, it throws up a segmentation fault at stdinit.

What causes this segmentation fault ?
Any idea would be greatly appreciated.


Thanks.

Re: Nrn timeout error

Posted: Mon Jan 28, 2013 7:23 am
by hines
A mod file that declares vgap to be a POINTER would be inconsistent with gap junctions that work in a parallel program. A vgap POINTER could only "watch" a variable, v, in the same
address space and could not watch a variable on another machine. To implement parallel gap junctions in which vgap and v can be on different machines requires the use of a pair of
ParallelContext.source_var and target_var calls, each on the machine where the corresponding variable exists, and the target_var needs vpre to be a RANGE variable, not a POINTER.

Returning to the timeout problem, if that still occurs with the latest version of NEURON, you should send me your code and instructions on how to see the problem and I can try to
diagnose what is going wrong. Send the zip file to michael dot hines at yale dot edu.

Re: Nrn timeout error

Posted: Mon Jan 28, 2013 8:03 pm
by shyam_u2
OK. I will execute it with recent NEURON version. Thank you Hines.

Re: Nrn timeout error

Posted: Tue Jan 29, 2013 3:24 am
by shyam_u2
I installed the recent alpha version and it works fine. Thank you for your help.