h.frecord_init() and Python Multiprocessing
Posted: Mon Aug 05, 2024 11:07 am
Hello,
I ran into an odd issue with my model in NEURON+Python that I think I have fixed but would appreciate further clarity to check that I have addressed it correctly.
I am simulating a large network of endocrine cells with lots of variables and need to run long simulations (15-30 minutes) due to slowly changing hormone concentrations that we are interested in studying. For this reason I have followed the advice here on the forum to break the simulations into chunks and have the following workflow:
This works great, but my simulations were still very long and I wanted to increase the efficiency wherever I could. To do this I had the idea to use python multiprocessing to have my simulations run on my main process, have worker processes that process each of my recording h.Vectors, and then have those worker processes pass the processed data to another process through a multiprocessing.Manager.Queue that handles writing the data to disk. This way my simulations could continue running while data is being processed and written. This looked something like
My simulations became much faster but I started to realized when visualizing my data that many of the variables appeared constant for multiple time chunks when I know they should be changing. Skipping past many tests I ran to figure out the issue, I believe that before all of the worker processes start h.frecord_init() is called which resizes them. Since numpy arrays created from using the as_numpy method on recording vectors map to the same place in memory as the recording vectors, any changes to the recording vectors will also be present in the numpy arrays. Therefore, the data passed to any worker process that hasn't started yet when h.frecord_init() is called has their data array reduced to the last value and this is why the data appears constant when I plot it.
I tested this by running a very small simulation (takes around 1 second to complete) with 2 time chunks and added the statement
right before my call to h.frecord_init(). This corrected the issue. I also tested this by creating deepcopies (copies of my recording vectors stored in a different place in memory than the original) of my numpy arrays and then passed those copies to my worker processes and this also corrected the issue.
Does all this make sense? And if so, does anyone have any recommendations for best practices on how to have multiple processes handle running simulations, processing, and writing data?
Any advice or thoughts are appreciated!
I ran into an odd issue with my model in NEURON+Python that I think I have fixed but would appreciate further clarity to check that I have addressed it correctly.
I am simulating a large network of endocrine cells with lots of variables and need to run long simulations (15-30 minutes) due to slowly changing hormone concentrations that we are interested in studying. For this reason I have followed the advice here on the forum to break the simulations into chunks and have the following workflow:
Code: Select all
# Create a dictionary with recorded variable names as keys and values as h.Vectors
rec_dict = create_rec_dict(section_list)
# Chunk end times is a list of end times for each chunk
# For example if I was running a 5000 ms simulation in chunks of 1000 ms
# it would be the list [1000, 2000, 3000, 4000, 5000]
for chunk_end_time in chunk_end_times:
# Run simulation until the end of the next time chunk
h.continuerun(chunk_end_time)
# Save chunk data... Omitting code for brevity
# Resize recording vectors so that max amount of memory consumed is close to
# the amount required to store one time chunks worth of data
h.frecord_init()
Code: Select all
import multiprocessing as mp
# Create a dictionary with recorded variable names as keys and values as h.Vectors
rec_dict = create_rec_dict(section_list)
# Create multiprocessing manager to manage data queue
with mp.Manager() as manager:
# Create data queue
data_queue = manager.Queue()
# Create writer process that will get processed data arrays from the data queue and write them to disk
data_writer = mp.Process(target=data_writer_func, args=(data_writer_args))
# Create multiprocessing pool to handle data_processor worker processes
with mp.Pool() as pool:
# Chunk end times is a list of end times for each chunk
# For example if I was running a 5000 ms simulation in chunks of 1000 ms
# it would be the list [1000, 2000, 3000, 4000, 5000]
for chunk_end_time in chunk_end_times:
# Run simulation until the end of the next time chunk
h.continuerun(chunk_end_time)
# Convert time vector to numpy array
time_array = rec_dict["Time"].as_numpy()
# Save chunk data
for var, vec in rec_dict.items():
if var == "Time":
continue
# Convert vector to numpy array
var_array = vec.as_numpy()
# Create worker process to process this data before passing it the data writer
pool.apply_async(target=data_processor, args=(var, var_array, time_array))
# Resize recording vectors so that max amount of memory consumed is close to
# the amount required to store one time chunks worth of data
h.frecord_init()
I tested this by running a very small simulation (takes around 1 second to complete) with 2 time chunks and added the statement
Code: Select all
import time
# Sleep for 10 seconds
time.sleep(10)
Does all this make sense? And if so, does anyone have any recommendations for best practices on how to have multiple processes handle running simulations, processing, and writing data?
Any advice or thoughts are appreciated!