Next: Command Line Options, Previous: Metadata Optimization, Up: Common features
Availability: ncbo, ncea, ncecat,
ncflint, ncpdq, ncra, ncrcat,
ncwa Short options: ‘-t’ Long options: ‘--thr_nbr’, ‘--threads’, ‘--omp_num_threads’ |
OMP_NUM_THREADS
environment variable, if present, or from the
OS, if not.
NCO may modify thr_nbr according to its own internal
settings before it requests any threads from the system.
Certain operators contain hard-code limits to the number of threads they
request.
We base these limits on our experience and common sense, and to reduce
potentially wasteful system usage by inexperienced users.
For example, ncrcat
is extremely I/O-intensive so we restrict
thr_nbr <= 2 for ncrcat
.
This is based on the notion that the best performance that can be
expected from an operator which does no arithmetic is to have one thread
reading and one thread writing simultaneously.
In the future (perhaps with netCDF4), we hope to
demonstrate significant threading improvements with operators
like ncrcat
by performing multiple simultaneous writes.
Compute-intensive operators (ncwa
and ncpdq
)
are expected to benefit the most from threading.
The greatest increases in throughput due to threading will occur on
large dataset where each thread performs millions or more floating
point operations.
Otherwise, the system overhead of setting up threads may outweigh
the theoretical speed enhancements due to SMP parallelism.
However, we have not yet demonstrated that the SMP parallelism
scales well beyone four threads for these operators.
Hence we restrict thr_nbr <= 4 for all operators.
We encourage users to play with these limits (edit file
nco_omp.c) and send us their feedback.
Once the initial thr_nbr has been modified for any operator-specific limits, NCO requests the system to allocate a team of thr_nbr threads for the body of the code. The operating system then decides how many threads to allocate based on this request. Users may keep track of this information by running the operator with dbg_lvl > 0.
By default, operators with thread attach one global attribute to any
file they create or modify.
The nco_openmp_thread_number
global attribute contains the
number of threads the operator used to process the input files.
This information helps to verify that the answers with threaded and
non-threaded operators are equal to within machine precision.
This information is also useful for benchmarking.