A ring network with multithreading

General issues of interest both for network and
individual cell parallelization.

Moderator: hines

jackfolla
Posts: 48
Joined: Wed Jul 07, 2010 7:42 am

A ring network with multithreading

Post by jackfolla »

Hi.

I want to know how is possible to apply the approach multithreads to the ring network (first example - chapter 3) shown in the tutorial "Traslating NEURON network models to parallel hardware" (Hines and Carnevale): in this case we have a set of neurons, each of which is assigned to a specific host (obviously, hosts may have multiple neurons, if the number of hosts is less than the number of neurons).

In the serial version i have a procedure run() that, with a loop for, calls the procedures connectcells(), spikerecord() and spikeout():

Code: Select all

proc run() {
   for i=0, i < n_simul {
     connectcells()  
     spikerecord()
     stdinit()
     continuerun(tstop)
     spikeout()
   }
}
My actually parallel version is very similar, for the procedure run(), to code shown above:

Code: Select all

proc prun() {
   for i=0, i < n_simul {
     connectcells()  
     spikerecord()
     {pc.set_maxstep(10)}
     stdinit()
     {pc.psolve(tstop)}
     spikeout()
   }
}
Now, I want to launch this simulations in parallel, and I thought that the best approach is with multithreading, but I have no idea how to proceed.

Could you help me?

PS: sorry for my english.
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: A ring network with multithreading

Post by ted »

jackfolla wrote:how is possible to apply the approach multithreads to the ring network (first example - chapter 3) shown in the tutorial "Traslating NEURON network models to parallel hardware" (Hines and Carnevale).
With current implementations of NEURON, if a model is distributed across multiple hosts with MPI (or OpenMPI, or whatever), the code on each host will run in single threaded mode. In order for a model to be simulated with multiple threads, it must be running on a single host.

Multithreaded execution on a single host is very easy--the only caveats are that it works only with fixed step and global variable time step integration, all mod files must be "thread safe," and multithreaded execution cannot be used with models that involve extracellular or LinearMechanism. For instructions, see
http://www.neuron.yale.edu/phpBB/viewto ... =22&t=1476
In the serial version i have a procedure run() that, with a loop for, calls the procedures connectcells(), spikerecord() and spikeout():

Code: Select all

proc run() {
   for i=0, i < n_simul {
     connectcells()  
     spikerecord()
     stdinit()
     continuerun(tstop)
     spikeout()
   }
}
Although it is sometimes useful to modify run(), it is not necessary or advisable to insert model setup or instrumentation code into this procedure. connectcells() and spikerecord() only have to be called once--connectcells() during model setup, and spikerecord() when setting up the instrumentation. Neither of these procedures should be called by run().

If you need to run multiple simulations on serial hardware, it is usually not a good idea to modify run(). Instead, it is better to create a special procedure. The following proc should work on serial hardware:

Code: Select all

proc batchrun() { local i
  for i=0, i < $1 {
    run()
    spikeout()
  }
}
A single procedure call with a numerical argument, like this
batchrun(10)
will launch a family of simulations.

A big warning: if your model involves any representation of "learning" or "plasticity" that changes model parameters, your batchrun() procedure must reset the model parameters to their proper starting values before it calls run().

For models parallelized with MPI, I would expect that a batch procedure written in hoc could be used to run a family of simulations, but I have not done this yet myself and will have to get back to you with a better answer.
Now, I want to launch this simulations in parallel
The exact details depend somewhat on your operating system and what implementation of MPI you are using. I'm using Linux and MPICH2, and the instructions for starting the MPI daemon and running simulations that are given in our 2008 article in Journal of Neuroscience Methods work fine for me. Under MSWin, I have to open an rxvt window (see the icon in the NEURON Program Group), then type
mpd&
to start the MPI daemon. Then I can CD to the directory where my model's main hoc file is located, and type the command
mpiexec -np NUMHOSTS nrniv -mpi myprogram.hoc
where NUMHOSTS is replaced by the number of hosts that I want to specify.
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: A ring network with multithreading

Post by ted »

ted wrote:For models parallelized with MPI, I would expect that a batch procedure written in hoc could be used to run a family of simulations, but I have not done this yet myself and will have to get back to you with a better answer.
Here's how.

First, copy ringpar.hoc to batringpar.hoc

Then edit batringpar.hoc, making the following changes:

1. Comment out these lines

Code: Select all

{pc.set_maxstep(10)}
stdinit()
{pc.psolve(tstop)}
that is, change them to

Code: Select all

// {pc.set_maxstep(10)}
// stdinit()
// {pc.psolve(tstop)}
Also comment out these lines

Code: Select all

spikeout()

{pc.runworker()}
{pc.done()}
quit()
that is, change them to

Code: Select all

// spikeout()

// {pc.runworker()}
// {pc.done()}
// quit()
2. Insert the following code at the end of the file

Code: Select all

// family of runs

proc setparams() { local i
  for i=0,nclist.count()-1 nclist.o(i).weight = 0.01*($1+1)/2
}

proc batchrun() { local i
  for i=0,$1-1 {
    setparams(i)
    if (pc.id==0) printf("\nRun %d  weight = %f\n", i, nclist.o(0).weight)

    pc.set_maxstep(10)
    stdinit()
    pc.psolve(tstop)

    // Report simulation results
    spikeout()
  }
  pc.done()
}

batchrun(nr)
quit()
Then you can start the MPI daemon by executing the command
mpd &
in a terminal window, and run a simulation by executing the command
mpiexec -np 2 nrniv -mpi -c "nr=2" batringpar.hoc
This will assign the value 2 to the parameter nr before loading and executing batringpar.hoc. To repeat with a different value for nr, just replace the "2" in "nr=2" with whatever you want.

Procedure batchrun() contains a loop that calls procedure setparams() nr times. On each call, the value of the argument to setparams is used to set the strength of the excitatory connections in the ring, and then a simulation is executed and results are printed. setparams() can be revised to make whatever changes you wish from run to run.
jackfolla
Posts: 48
Joined: Wed Jul 07, 2010 7:42 am

Re: A ring network with multithreading

Post by jackfolla »

Hi Ted.
Thank you very much for your help: it's very useful.
I used your suggestions and I have corrected both the serial and the parallel code: now work correctly.
Also, I tried using the ParallelComputeTool to exploit multithreading in serial code, introducing this code:

Code: Select all

{load_file("parcom.hoc")}
ParallelComputeTool[0].nthread(20)
initially it did not work, and it returned the following error message:
usable mindelay is 0 (or less than dt for fixed step method)
0 in Bat_Connect_Cell.hoc near line 181
0 batchrun(nr)
^
0 finitialize(-65)
0 init()
0 stdinit()
0 batchrun(2)
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
rank 0 in job 19 pasquale-laptop_40121 caused collective abort of all ranks
exit status of rank 0: return code 255
Then, I changed the value of "ncstim.delay" from 0 to 0.1 (because dt =0.025), and it works correctly and quickly.

Finally, I tried adding the same code lines in the parallel version, but it don't works correctly because it prints less information on the spike.
Example:
Output of correct parallel version:
Run 1 weight = 0.010000

time cell
Dim tvec: 16
25.8 0
34.25 2
42.7 4
43.775 0
51.15 6
52.2 2
59.6 8
60.65 4
68.05 10
69.1 6
76.5 12
77.55 8
84.95 14
86 10
93.4 16
94.45 12

Dim tvec: 16
30.025 1
38.475 3
46.925 5
47.975 1
55.375 7
56.425 3
63.825 9
64.875 5
72.275 11
73.325 7
80.725 13
81.775 9
89.175 15
90.225 11
97.625 17
98.675 13
Output of parallel version with adding of:

Code: Select all

{load_file("parcom.hoc")}
ParallelComputeTool[0].nthread(2)
2
2
Run 1 weight = 0.010000

time cell
Dim tvec: 4
25.8 0
34.25 2
42.7 4
43.775 0

Dim tvec: 3
30.025 1
38.475 3
47.975 1
I was wrong, or I can't do this?

Thank you in advance for your time.
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: A ring network with multithreading

Post by ted »

Perhaps I was not sufficiently explicit when I posted this comment
With current implementations of NEURON, if a model is distributed across multiple hosts with MPI (or OpenMPI, or whatever), the code on each host will run in single threaded mode. In order for a model to be simulated with multiple threads, it must be running on a single host.
or maybe it was lost in the middle of everything else.

So to put it in a different way. Current versions of NEURON (including any of the alpha installers) can run multithreaded simulations of a model on a single host, or single threaded simulations of a model that is distributed over multiple hosts. It cannot run multithreaded simulations of a model that is distributed over multiple hosts.

Which means that if you want to simulate the ring network using multithreaded execution, do so with the serial implementation of the model, not with the parallelized implementation.
jackfolla
Posts: 48
Joined: Wed Jul 07, 2010 7:42 am

Re: A ring network with multithreading

Post by jackfolla »

Hi Ted.
Thank you for your help, is very precious.

I have another question: in single thread mode, when I need to adding new mechanisms to the interpreter, I use "nrnivmodl"; after, I run with "nrngui x.hoc"
In multithreaded mode I need to make thrade safe this files with "mkthreadsafe".

So, I use "nrnivmodl", then "mkthradesafe", and finally "nrngui x.hoc"

But occurs an error like this: "nrniv: GABAb is not thread safe near line 0" (one of the files .mod is gabab.mod).

Maybe I've done something wrong?

I'm sorry if I was not very clear in description of the problem.
jackfolla
Posts: 48
Joined: Wed Jul 07, 2010 7:42 am

Re: A ring network with multithreading

Post by jackfolla »

Hi Ted,

I solved thanks to:
Use cnexp or derivimplicit instead of euler

The euler integration method is not thread safe (it is also numerically unstable, which should be reason enough to avoid it). Instead, use cnexp if each dstate/dt equation depends linearly only on the state, or use derivimplicit.
http://www.neuron.yale.edu/phpBB/viewto ... =22&t=1476

Bye.
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: A ring network with multithreading

Post by ted »

Glad to hear that you were able to solve that.

Every time I find a new mod file that I might want to use, the first thing I do is to make sure that it is using the correct integration method. Some old mod files that use euler are still floating around.
jackfolla
Posts: 48
Joined: Wed Jul 07, 2010 7:42 am

Re: A ring network with multithreading

Post by jackfolla »

Dear Ted,

now I would like to collect the execution times, calculating, for each family of simulations, the average of this times, by varying:

In the case of the serial code: number of cells and number of threads.
In the case of the parallel code: number of cells and number of processors.

If possible, I would like to automate this.

You told me that the procedures mkcells(), connectcells() etc. only have to be called once.

Is there a way to to automate this?

Your suggestions are very useful for my work.
Thanks, Pasquale.
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: A ring network with multithreading

Post by ted »

jackfolla wrote:now I would like to collect the execution times, calculating, for each family of simulations, the average of this times, by varying:

In the case of the serial code: number of cells and number of threads.
In the case of the parallel code: number of cells and number of processors.

If possible, I would like to automate this.
This is doable by a combination of hoc and shell programming. Very briefly, the number of cells and the number of processors must be specified by statements on the command line in a shell script. The number of threads can be specified by a "for loop" in your hoc program. See below for more detail.
You told me that the procedures mkcells(), connectcells() etc. only have to be called once.
True as long as the number of cells does not change. If the number of cells changes, it is best to exit NEURON and start anew.


The steps required to do what you want are:

1. Revise the serial and parallel implementations so that they report the run time. Do this with startsw() (documented in the Programmer's Reference). For example

Code: Select all

runtime=startsw()
run()
runtime=startsw()-runtime
print runtime
2. Revise the serial and parallel implementations so that the number of cells in the network can be specified by a command line statement. In case you aren't already aware of this feature of NEURON,
nrniv -c "statement" foo.hoc
executes "statement" before executing foo.hoc
So if foo.hoc is a program in which a parameter NCELLS controls the number of cells that will be created, then
nrniv -c "NCELLS=9" foo.hoc
will set NCELLS to 9 before executing foo.hoc


Now focus on the serial implementation.

3. Use the ParallelContext class to specify the number of threads used by the serial implementation. After the end of model setup, but before calling run(), insert
objref pc
pc = new ParallelContext()
pc.nthread(nthreads)
where the value of nthreads is specified by a simple assignment statement.
After you have this working, revise your run control code so that it uses a "for loop" to execute a series of runs, each run having a different number of threads, e.g.

Code: Select all

objref pc
pc = new ParallelContext()

for nthreads=1,MAXNTHREADS {
  pc.nthread(nthreads)
  . . . whatever else you have to do before calling run . . .
  runtime=startsw()
  run()
  runtime=startsw()-runtime
  . . . whatever else you have to do after this point . . .
}
4. Create a shell script that executes the serial implementation multiple times--each time specifying a different number of cells for the network. Details of the shell script will depend on your software environment and what you want to do. For a very simple example, see
Automating tasks: -c "statement" and batch runs
http://www.neuron.yale.edu/phpBB/viewto ... =28&t=1747


Finally, turning to the parallel implementation--

5. NEURON has no way to change the number of processors used by a distributed model. This must be specified on the command line. For this you need a shell script that will execute a series of commands similar to these:
mpiexec -np 2 nrniv -mpi -c "NUMCELLS=3" ringpar.hoc
mpiexec -np 2 nrniv -mpi -c "NUMCELLS=4" ringpar.hoc
mpiexec -np 2 nrniv -mpi -c "NUMCELLS=5" ringpar.hoc
. . .
mpiexec -np 4 nrniv -mpi -c "NUMCELLS=3" ringpar.hoc
mpiexec -np 4 nrniv -mpi -c "NUMCELLS=4" ringpar.hoc
mpiexec -np 4 nrniv -mpi -c "NUMCELLS=5" ringpar.hoc
This is easily done with a pair of nested "for" loops, but details depend on your operating system and shell.


As always, test at each step to make sure that your revisions work properly.
jackfolla
Posts: 48
Joined: Wed Jul 07, 2010 7:42 am

Re: A ring network with multithreading

Post by jackfolla »

Hi Ted,
your answer was very exhaustive.

My last problem is the following: synapses between a cell and the other is placed randomly;
consequently, I get different results in simulations between the serial and the parallel version (and also depending on the number of hosts used).
Obviously, this is not entirely correct.

My code is structured in this way:

Code: Select all

proc batchrun() { local i localobj q
	q = new Random()
	seed1 = q.discunif(0,9943799)
        for i=1,n_simul {
		seed2 = q.repick()
               // governs which dendrite will have the synapse
		m = new Random()
         	m.ACG(seed2)
  	        m.discunif(0, ndend-1)
                for k=0, NCELL-1 {
		       place_syn(k)
		}
		run()
                ...
        }
}
Now, in the parallel case, lets nhosts the number of hosts, then for the first nhosts cell synapses are identical, because each processor will calculate in the same way.

The solution is to find a method that allows me to have a situation equivalent to the serial case (eg. through appropriate loops for), or it is possible to use methods like gather and scatter, so that each processor passes the random value calculated to the next processor?

This second alternative is impractical given that the processors are called in random order.

Thanks so much.
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: A ring network with multithreading

Post by ted »

jackfolla wrote:your answer was very exhaustive
meaning that implementing those steps will be exhausting . . .
My last problem is the following: synapses between a cell and the other is placed randomly;
consequently, I get different results in simulations between the serial and the parallel version (and also depending on the number of hosts used).
Actually, this is your first problem, and it should be fixed before taking any of the steps that I described. Otherwise, how will you be able to tell if a revision to your program has introduced a bug?

The solution is to associate each cell with its own pseudorandom sequence generator, where the generators are configured to produce number streams that do not overlap each other. For an example of how to do this, see
4. Second example: a network with random connectivity
in our paper
Hines, M.L. and Carnevale, N.T.
Translating network models to parallel hardware in NEURON.
J. Neurosci. Methods 169:425-455, 2008.
(preprint available from http://www.neuron.yale.edu/neuron/nrnpubs).
Also be sure to read the documentation of MCellRan4 in the Programmer's Reference
http://www.neuron.yale.edu/neuron/stati ... #MCellRan4

Since each cell has its own generator, and each generator produces its own stream, proc batchrun() does not need to create any other instances of the Random class. It becomes

Code: Select all

proc batchrun() { local i
  for i=1,n_simul {
    place_synapses()
    run()
     . . .
  }
}
Since I don't know the detailed architecture of your network, the following is somewhat general.

Assume that (1) each host has its own List of cell instances, (2) for each cell instance there is a corresponding synaptic mechanism that you want to reposition, and (3) the synaptic mechanism to be repositioned is a public member of the cell class. Then, on each host, place_synapses() merely has to iterate over that host's List of cells like so (in semi-pseudocode)

Code: Select all

for i=0,list_of_cells.count()-1
  decide which dendrite of cell i should have the synapse attached to it
  move the synapse to that dendrite of list_of_cells.o(i)
shailesh
Posts: 104
Joined: Thu Mar 10, 2011 12:11 am

Re: A ring network with multithreading

Post by shailesh »

Came across this post and this bit interested me:
Current versions of NEURON (including any of the alpha installers) can run multithreaded simulations of a model on a single host, or single threaded simulations of a model that is distributed over multiple hosts. It cannot run multithreaded simulations of a model that is distributed over multiple hosts.
Since this was posted around 5 years back, I was wondering whether it still holds true for NEURON v7.4?
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: A ring network with multithreading

Post by ted »

shailesh wrote:Came across this post and this bit interested me . . . I was wondering whether it still holds true for NEURON v7.4?
For some time now (at least since 7.3 if not earlier) parallel simulation execution can involve any combination of mutithreaded, distributed model, and bulletin board style parallelization.
shailesh
Posts: 104
Joined: Thu Mar 10, 2011 12:11 am

Re: A ring network with multithreading

Post by shailesh »

Thanks, thats good to know.
Given this sccenario: Running a network model having 16 cells on the NeuroScience Gateway (NSG), with 2 nodes and 4 cores/node... giving me a total of 8 processors (nhost=8), with the cells distributed equally 2 per host. How could I make use of multi-threading here? As a rule of thumb, I believe, the number of threads is set to the number of available processors (here 8). But how would this end up distributing the work load between the processors? The cores, I suppose, would be managing 2 cells apiece.

A part of the confusion stems from the fact that on my desktop, which has 4 cores, I get pc.nhost = 1.
But on NSG with 2 nodes and 4 cores/node, I get pc.nhost = 8. Why so?

Also, if I set the number or threads to say 'n'. Would there be total of 'n' threads across the entire system or 'n' threads per node? I am quite sure it is the former, but want to confirm.
Post Reply