This FAQ is specific to Open-MX version 1.0.x. If using a more recent release (starting with 1.0.90), you should refer to the relevant FAQ. See the main FAQ for more links.
If you do not find your answer here, feel free to contact the open-mx mailing list.
If you do not find your answer here, feel free to contact the open-mx mailing list.
Open-MX is a software implementation of Myricom's Myrinet Express protocol. It aims at providing high-performance message passing over any generic Ethernet hardware.
Open-MX implements the capabilities of the MX firmware (running in Myri-10G NICs) as a driver in the Linux kernel. A user-space library exposes the MX interface to legacy applications.
Open-MX implements MX programming interface with API and ABI compatibility and it is also wire-compatible with MX-over-Ethernet. See the Native MX Compatibility section for details.
There are some tiny differences between MX and Open-MX implementations:
There are also some tiny differences between the native MX and Open-MX programming interfaces. These differences are hidden by Open-MX API/ABI compatibility layer. But if you plan to use the Open-MX specific API directly, you might want to know that:
Open-MX supports Linux on any architecture.
The Open-MX driver works at least on Linux kernels >=2.6.15. Kernels older than 2.6.15 are unlikely to be ever supported due to various important functions being unavailable (especially vm_insert_page).
The Open-MX driver is regularly updated for newer kernels, making it likely to work on the latest stable kernel even before it is actually released.
Open-MX works on all Ethernet hardware that the Linux kernel supports. The only requirements is that the MTU is large enough (details) and that all connected peers are on the same LAN, which means there is no router between them (switches are OK).
Open-MX was designed to be compatible with MX wire-specifications. It means that 4 kB frames have to be accepted by the network. So all boards and switches on the traffic need to support a 4144 bytes MTU.
If Open-MX is configured in non-MX-wire-compatible mode (by passing --disable-mx-wire to the configure script), it uses larger frames to improve performance by reducing the packet rate. The required MTU is then 8224.
It is also possible to force Open-MX to work with MTU 1500 (by passing --disable-mx-wire --with-mtu=1500 to the configure script). However, this option is not recommended for bandwidth intensive applications.
Yes. Open-MX talks to the Ethernet layer as IP does, but it does not use the same Ethernet packet type. It means that IP and Open-MX can perfectly coexist on the same network and drivers, thanks to operating system passing the incoming packets to the corresponding receive stack.
Yes. By default, Open-MX will encode its packet headers in network-order, unless --disable-endian has been given to the configure script. Open-MX can thus make big-endian architectures talk to little-endian ones, or 32bits ones to 64bits, ...
However, it is obviously up to the application to make sure that its data is passed through the network in the endian-independant way.
Bugs should be reported as Gitlab Issues or sent to the open-mx mailing list. Questions may be asked there too.
Lots of information might be useful when diagnosing a bug, see the REPORTING-BUGS file in the source tree for details.
Assuming you want to connect 2 nodes using their 'eth2' interface:
$ ./configure $ make $ make install
Note that if building from SVN, you will need to generate the configure script and common/omx_config.h.in header first. autoconf and autoheader are required to do so:
$ autoconf $ autoheader
$ ifconfig eth2 up mtu 9000
$ /path/to/open-mx/sbin/omx_init start ifnames=eth2
$ omx_info node1:0 (board #0 name eth2 addr 01:02:03:04:05:06) ============================================== 1) 01:02:03:04:05:06 node1:0 2) a0:b0:c0:d0:e0:f0 node2:0
node1 $ omx_perf Successfully attached endpoint #0 on board #0 (hostname 'node1:0', name 'eth2', addr 01:02:03:04:05:06) Starting receiver...
then on the second node:
node2 $ omx_perf -d node1:0 Successfully attached endpoint #0 on board #0 (hostname 'node2:0', name 'eth2', addr a0:b0:c0:d0:e0:f0) Starting sender to node1:0...
You should get performance numbers such as
length 0: 7.970 us 0.00 MB/s 0.00 MiB/s length 1: 7.950 us 0.00 MB/s 0.00 MiB/s [...] length 4194304: 8388.608 us 500.00 MB/s 476.83 MiB/s
Open-MX provides implements the Myrinet Express (MX) protocol and application interface on top of regular Ethernet hardware. A user-space library manages MPI-like requests and passes them to the Open-MX driver which maps them directly onto the software Ethernet layer of the Linux kernel. Packets are sent/received through the underlying (unmodified) driver in a MX-similar way.
The Open-MX driver is always thread-safe. The user-space library is not thread-safe by default. You should pass --enable-threads to the configure script to enable thread safety.
Yes. Open-MX may use a software loopback to send messages from one endpoint to itself (self communications) or to another endpoint of any interface of the same host (shared communications). This loopback is faster than going on the network up to a switch and then coming back. And it is guaranteed to work (while some switches do not send packets back to their sender).
If using a single node, it is possible to only attach the loopback interface (lo) to Open-MX and let the stack switch to optimized self or shared-memory communication.
If a Open-MX function fails for any reason (resource shortage, invalid parameters given by the application, ...), or if a request completes with an erroneous status code (remote endpoint closed or non-responding, ...), Open-MX will by default abort and display an error message. See How to debug an abort message? to find out where the problem comes from.
This behavior is caused by the default error handler, which may be changed by applications through the omx_set_error_handler function. It is also possible to change it at runtime by setting OMX_FATAL_ERRORS=0 in the environment. All error codes will then be returned to the application instead of aborting from within the Open-MX library.
The Open-MX library will also abort under some circumstances, even if fatal errors have been disabled by the user. Apart from internal assertions detecting an implementation bug, the main reason for aborting is when the driver closes an endpoint by force. Fortunately, it only occurs in rare circumstances such as Ethernet hardware failure or the administrator closing an interface.
If you think you found a bug, see What if I find a bug?.
$ ./configure $ make $ make install
To display full build command line instead of the default short messages, V=1 should be passed to make.
By default, Open-MX will be installed in /opt/open-mx. Use --prefix on the configure line to change this (or set prefix on the 'make install' command line).
Open-MX brings the omx_init initialization scripts which takes care of loading/unloading the driver and managing the peer table.
$ sbin/omx_init start
To choose which interfaces have to be attached, some module parameters may be given on the command line:
$ sbin/omx_init start ifnames=eth1
By default, Open-MX will install in /opt/open-mx. It is possible to change this path by passing --prefix=</new/path> to the configure script, or by passing prefix=</new/path> on the make install command line.
All Open-MX install files should be available to all nodes since the driver and some tools are required on startup. It is thus recommended that you use a NFS-shared directory as the above prefix.
To simplify Open-MX startup, you might want to install the omx_init script within the startup scripts on each node:
$ sbin/omx_local_install
Then Open-MX may then be started with:
$ /etc/init.d/open-mx start
You might want to configure your system to auto-load this script at startup.
See Managing interfaces to configure which interfaces have to be attached on startup.
Most NFS configurations do not allow root on the client to operate as root on the server's files. When running 'make install' as root, you might experience problems because some Makefiles (especially the kernel driver's one) might modify some files before actually installing anything.
To work around this, assuming everything has been built as non-root before, you may use (as root)
$ make installonly
so that it really only installs things without checking whether the build is up-to-date.
If the application or the MPI implementation is dynamically linked against Open-MX, there is nothing to do since the Open-MX library ABI (binary interface) is stable and will not change when reconfiguring/recompiling.
However, there is also an internal binary interface between the library and the kernel driver. If you reconfigure Open-MX in a different way, and load the new kernel module, the corresponding new library should be used as well. In case of dynamic linking, it should be transparent assuming the new library replaced the old file. In case of static linking, the above application or MPI implementation should be relinked against the new Open-MX static library.
There are actually some cases where the driver configuration may be changed without requiring the new library. For instance, if the MTU or MX-wire-compatibility configuration changes, the library will dynamic adapt its behavior according to information provided by the driver at startup.
The kernel module should preferably be compiled with the same compiler than the kernel has been. To change the compiler for the kernel module, pass KCC=<othercompiler> on the configure or make command line.
During configure, Open-MX checks the running kernel with 'uname -r' and builds the open-mx module against it, using its headers and build tree in /lib/modules/`uname -r`/{source,build}. To build for another kernel, use
$ ./configure --with-linux=/path/to/kernel/headers/
To build using another kernel build tree, use
$ ./configure --with-linux=/path/to/kernel/headers/ --with-linux-build=/path/to/kernel/build/tree/
If for some reason (for instance for multiple kernel support) you need to rebuild the driver for a different kernel, it is possible to avoid reconfiguring/rebuilding the whole tree. You need to pass the kernel build path, kernel headers path and kernel release number:
$ make driver LINUX_BUILD=</path/to/kernel/build> LINUX_HDR=</path/to/kernel/headers> LINUX_RELEASE=<version> $ make driver-install LINUX_BUILD=</path/to/kernel/build> LINUX_HDR=</path/to/kernel/headers> LINUX_RELEASE=<version>
Once the module is loaded, udev creates a /dev/open-mx file which is used by user-space libraries and programs. Additionally, the Open-MX init script will create the device node in case udev was not running. The --with-device configure option may be used to change the name of this device file, its group or mode. Write access to this file is required when using Open-MX.
There is actually also another /dev/open-mx-raw device file that may be used by the peer discovery process to send/recv raw packets. It may be configured similarly with --with-raw-device.
By default, when loading the Open-MX driver, all existing network interfaces in the system will be attached (except those above 32 by default), except the ones that are not Ethernet, are not up, or have a small MTU.
To change the order or select which interfaces to attach, you may use the ifnames module parameter when loading:
$ /path/to/open-mx/sbin/omx_init start ifnames=eth2,eth3 $ insmod lib/modules/.../open-mx.ko ifnames=eth3,eth2
Once Open-MX has been installed with omx_local_install, the /etc/open-mx/open-mx.conf may be modified to configure which interfaces should be attached at startup.
The current list of attached interfaces may be observed by reading the /sys/module/open_mx/parameters/ifnames special file. Writing 'foo' or '+foo' in the file will attach interface 'foo'. Writing '-bar' will detach interface 'bar', except if some endpoints are still using it. To force the removal of an interface even if some endpoints are still using it, '--bar' should be written in the special file. Multiple commands may be sent at once by separating them with commas.
Finally, it has to be noted that the dynamic peer discovery cannot discover newly attached or detached local interfaces. As soon as the list of local interfaces changes, the local discovery process should be restarted (see Peer Discovery):
$ omx_init restart-discovery
These interfaces must be 'up' in order to work.
$ ifconfig eth2 up
However, having an IP address is not required.
Also, the MTU should be large enough for Open-MX packets to transit. 9000 will always be enough. Look in dmesg for the actual minimal MTU size, which may depend on the configuration. A relevant warning will be displayed in dmesg if needed.
$ ifconfig eth2 mtu 9000
If one of above requirements is not met, a warning should be printed in user-space when opening an endpoint.
The list of currently open endpoints may be seen with:
$ omx_endpoint_info
The interfaces may also be observed with the omx_info user-space tool.
Yes. Open-MX requires all communication endpoints to be attached to an interface, even if it is not used by actual network traffic underneath. It is fortunately possible to attach the loopback interface (lo) and either use it as a regular interface talking to itself, or bypass it and use the optimized shared communicarion.
Each Open-MX node has to be aware of the hostnames and MAC addresses of all other peers.
By default, a dynamic peer discovery is performed but it is also possible to enter a static list of peers manually.
The --enable-static-peers option may be used on the configure command line to switch from dynamic to static peer table. It is also possible to switch later by passing --dynamic-peers or --static-peers to the omx_init startup script.
It is possible to restart the peer table management process without restarting the whole Open-MX driver with: $ omx_init restart-discovery This is especially important when attaching or detaching interfaces at runtime while using dynamic peer discovery. But it may also for instance be used to switch between static and dynamic peer table.
Dynamic discovery may sometimes take several seconds before all nodes become aware of each others. If the fabric is always the same, it is possible to setup a static peer table using a file. To do so, Open-MX should be configured with --enable-static-peers.
A file listing peers must be provided to store the list of hostnames and mac addresses in the driver. The omx_init_peers tool may be used to setup this list. The omx_init startup script takes care of running omx_init_peers automatically using /etc/open-mx/peers when it exists.
The contents of the file is one line per peer, each containing 2 fields (separated by spaces or tabs): * a mac address (6 colon-separated numbers) * a board hostname (<hostname>:<ifacenumber>)
To change the location of the peers file, it is possible to use the --with-peers-file=<path> configure option, or the --static-peers=<path> omx_init option.
If Open-MX has been configured for dynamic peer discovery by default, the --static-peers omx_init option may also be used to switch to static peer table.
FMA is Myricom's fabric management system. It is used in MX by default. If you plan to make MX and Open-MX operable, or just want a scalable and powerful peer discovery tool, you may tell Open-MX to use FMA instead of the default omxoed dynamic peer discovery program.
FMA is available from Myricom's FMS page or may be copied from the MX source tree.
To build FMA and use, just unpack the FMA source within the Open-MX source directory (as a fma/ subdirectory), and run configure, build and install.
FMA only correctly supported MX-over-Ethernet (or fabrics mixing MX-o-E and MX-over-Myrinet nodes) starting with 1.3.0. So, if running FMA as the peer discovery tool for Open-MX, at least FMA 1.3.0 is needed.
The same FMA version should be running on all nodes. This point is especially important if the fabric mixes MX and Open-MX nodes (see What is MX-wire-compatibility?). There are two easy ways to make sure the same FMA is used on MX and Open-MX nodes:
When mixing Open-MX an native MX hosts on the same fabric, it is required that the peer discovery processes are compatible. MX uses FMA by default, so Open-MX should be configured to use FMA in this case. If MX was specifically configured to use mxoed, then Open-MX may keep using its default discovery tool, omxoed, which is compatible with mxoed.
FMA may also be much slower than omxoed on small networks. Since this is Open-MX' main use case, it is recommended to keep using the default configuration (i.e. use omxoed) unless the fabric contains some native MX hosts.
Setting up a static peer table is faster than both FMA and omxoed but it obviously only works for statix fabric. Note that it is possible to manually add some peers later using the omx_init_peers tool. Dynamic peer discovery
By default, Open-MX uses the omxoed program to dynamically discover all peers connected to the fabric, including the ones added later. The only requirement is that the omxoed program runs on each peer.
If Myricom's FMA source directory is unpacked within the Open-MX source (as the "fma" subdirectory), Open-MX will automatically switch (at configure time) to using FMA instead of omxoed as a peer discovery program. Using FMA is especially important when talking to native MX hosts since they will use FMA by default as well.
The discovery program is started automatically by the omx_init startup script. If Open-MX has been configured to use a static peer table by default, it is still possible to switch to dynamic discovery by passing --dynamic-peers to omx_init.
It is also possible to switch from fma to omxoed by passing the option --dynamic-peers=omxoed to omx_init.
Open-MX may manage up to 65536 peers on the fabric. However, since such big fabrics are quite unusual, the Open-MX driver only supports 1024 peers by default. This threshold may be increased when loading the driver by passing the module parameter peers=N.
Open-MX exports a message-passing programming interface to applications. It also exports another interface called "raw" used by peer discovery programs to manage the peer table in the driver.
Unless you have a very good reason to not use the existing peer discovery programs or a static peer table, you really do not want to look at the raw interface. The regular message-passing interface should provide everything you need.
To get best performance for benchmarking purposes between homogeneous hosts, you might want to:
Open-MX enables 2 types of wire-compatibility by default, native-MX compatibility and endian-independent compatibility. Disabling them when they are not needed may improve the performance.
If native MX compatibility is not required on the wire you might want to pass --disable-mx-wire to the configure command line so that larger packets are used for large messages. Note that the required MTU will jump from roughly 4kB to 8. See Native MX Compatibility for details about wire compatibility, and MTU support.
If the machines on the network all use the same endian-ness, you might want to pass --disable-endian to the configure command line so that Open-MX does not swap header bits into/from network order. It may reduce the latency very slightly.
Achieving optimal performance requires to avoid memory copies as much as possible. This is done using memory registration, which pins buffers in physical memory. Since this operation is expensive, it is interesting to do only once per buffer when the buffer is used multiple times. To do so, you should set the OMX_RCACHE environment variable to 1.
$ export OMX_RCACHE=1
However, this configuration may be dangerous if the application frees the buffer in the meantime. Since Open-MX has no way to detect this for now, this registration cache should be used with caution.
Most Ethernet drivers use interrupt coalescing to avoid interrupting the host once per incoming packet. While this is good for the throughput, it increase the latency a lot, up to several dozens of microseconds.
To get the best latency for Open-MX, interrupt coalescing should be reduced. The easiest way to do so is to disable it completely.
$ ethtool -C eth2 rx-usecs 0
However, it is often better to set it close to the latency so that the observed latency is as optimal while there is still a bit of coalescing for consecutive packets. So, assuming that you observe a N usecs latency with Open-MX when interrupt coalescing is disabled, a nice configuration would to set coalescing to N or N-1 usecs:
$ ethtool -C eth2 rx-usecs <N-1>
If your driver supports Adaptive interrupt coalescing, it may well help Open-MX performance significantly. Indeed, it basically automatically disables coalescing (and thus improves latency) when the amount of packets is low, and reenables a high coalescing delay (and thus improve the overall performance) when the amount of packets is high. Thus, when it is supported, you probably want to enable adaptive interrupt coalescing on the receive side:
$ ethtool -C eth2 adaptive-rx on
Then, if you do not observe optimal performance yet, you may want to tune adaptive coalescing so that for instance a pingpong-like pattern gets the best latency. Since a 6-microseconds pingpong generates 83 thousands of packets per second, you may for instance tell the driver to disable coalescing entirely when less than 150 thousands packets are received per seconds:
$ ethtool -C eth2 pkt-rate-low 150000 $ ethtool -C eth2 rx-usecs-low 0
Open-MX may use a software loopback to send messages from one endpoint to itself (self communications) or to another endpoint of any interface of the same host (shared communications). If these shared/self communication are useless, the library overhead may be slightly reduced by disabling them either at build-time with --disable-self or --disable-shared, or at runtime by setting OMX_DISABLE_SELF=1 or OMX_DISABLE_SHARED=1 in the environment.
This is especially the case if there is a single process on each node and it does not talk to itself, or if multiple processes of the same do not talk to each other.
Lots of modern platforms such as Intel I/OAT-enabled servers provide hardware DMA engine to offload memory copies. Open-MX performance may increase very significantly thanks to this feature.
The support for dmaengine is automatically built in Open-MX when supported by the kernel and may be configured at runtime through several module parameters. See Advanced configuration for details.
Note that DMA engine hardware may still require the administrator to load the corresponding driver, for instance the 'ioatdma' kernel module. The kernel logs will display the DMA engine status when loading Open-MX or modifying some module parameters.
Yes. The Open-MX receive stack is composed of a kernel routine running in the bottom half on any of the machine cores, depending on where the NIC is sending its IRQs. Device drivers usually configure IRQs to be sent to all cores in a round-robin fashion. This behavior distributes the receive workload on all cores, which is good for the vast majority of MPI jobs where each core runs exactly one process.
If you plan to have less processes than cores, you might experience some performance degradation caused by idle cores going to sleep and thus taking more time to process incoming IRQs. A dirty way to work around this problem is to prevent core from sleeping by booting the kernel with the idle=poll parameter.
Or you may restrict the IRQs coming from the NIC to the subset of cores that run the Open-MX processes. For instance, if your processes are bound to core #0-1, the IRQ affinity mask should be set to 3 using:
$ echo 3 > /proc/irq/<irq>/smp_affinity
where <irq> is the IRQ line of the NIC.
Under extreme circumstances, for instance for benchmarking purpose, you may want to use a single process per machine and bind it to a different core from the one receiving IRQs. This way, they will not fight for CPU time. However, since cache line sharing is critical, the binding should be done on the very next core so that cache effect cost is very small. For instance, binding IRQs on core #1 and the process on core #0:
$ echo 2 > /proc/irq/<irq>/smp_affinity $ numactl --physcpubind 0 myprocess
Another way to bind process is to use the OMX_PROCESS_BINDING environment variable (see What are Open-MX runtime configuration options? ).
Such a configuration may be the best for benchmarking purpose, especially on the latency side. However, under a normal load, having IRQs go to all cores is probably a good idea since most applications will use one process per core.
Note that the core numbering is far from being linear in modern machines. It is likely that cores numbered as #0 and #1 by the software are actually not close to each other in the actual hardware. The numbering is often a round-robin across physical processors to maximize memory bandwidth or so.
Some old kernels (<2.6.18) have problems with some drivers that receive data in frags (non-linear skbuff). As a workaround, they will linearize these skbuffs unless their target protocol stack explicitly supports non-linear skbuff. This basically adds a memory copy for all packets except IPv4 and IPv6, which would decrease Open-MX performance.
To avoid this, if IPv6 is not in use on the network, you might want to tell Open-MX to use the IPv6 Ethernet type. This way, its skbuffs will not be linearized uselessly. To enable this workardound, you should pass --with-ethertype=0x86DD to the configure command line.
Note that this solution is only required under very special circumstances and should be avoided in most of the cases.
If you need some Open-MX hosts to talk to some MX hosts, you should keep the wire-compatibility enabled (it is enabled by default). If you only have Open-MX hosts talking on the network, you can disable it to improve performance (see Performance Tuning).
Once Open-MX is configured in wire compatible mode, you need to make sure that the nodes running in native MX mode are using a recent MX stack (at least 1.2.5 is recommended) configured in Ethernet mode. Once peer table are setup on both MX and Open-MX nodes, the fabric is ready.
You have to make sure that the same peer discovery program (or "mapper") is used on both sides. By default, MX uses the FMA by default. So the FMA source should be unpacked as a "fma" subdirectory of the Open-MX source so that the configure script will enable FMA by default instead of omxoed for dynamic peer discovery.
Note that all FMA versions are not wire-compatible, even if the underlying MX and/or Open-MX stacks are compatible. See Which FMA version should I use? for details.
Under some circumstances, MX may also rely on mxoed, which is compatible with Open-MX' omxoed.
The peer table should be setup on the Open-MX nodes as usual with omx_init_peers, with a single entry for each Open-MX peer and each MX peer.
On the MX nodes, each Open-MX peer with name "myhostname:0" and mac address 00:11:22:33:44:55 should be added with:
$ mx_init_ether_peer 00:11:22:33:44:55 00:00:00:00:00:00 myhostname:0
Note that MX 1.2.5 is required for mx_init_ether_peer to be available.
Also note that it is possible to let the regular MX dynamic discovery map the MX-only fabric and then manually add the Open-MX peers. To do so, the regular discovery should first be stopped with:
$ /etc/init.d/mx stop-mapper
The Open-MX API is slightly different from that of MX, but Open-MX provides a compatibility layer which enables:
This compatibility is enabled by default and has a very low overhead since it only involves going across basic conversion routines.
If you do not plan to use any applications that has been written for MX, it is possible to disable the API and ABI compatibility alltogether by passing --disable-mx-abi to the configure script.
If you only use Open-MX and never link your application with MX, you may want to hardwire the translation from the MX API into Open-MX calls at compile time to avoid going across these conversion routines at runtime. To do so, you may pass --enable-mx-translate to the configure script.
Note that enabling this automatically API translation does not disable the MX ABI support in Open-MX since it may be required by some sanity checks when building external applications. It is thus still possible to link MX applications with Open-MX. But any application built against Open-MX will not be MX ABI compatible in this case.
Open-MX provides the binary interface of MX 1.2.x, which is also backward compatible with any application built on an older MX (up to 0.9). So if you built your application on top of MX (unless it was 10 years ago), it will work fine with Open-MX.
Open-MX is wire compatible with MX 1.2.x. It means that a host running the native MX stack 1.1 or earlier will not be able to talk with an Open-MX host.
Also, since all MX versions do not bring the same FMA version, if you want to use FMA as a peer discovery tool, you might want to look at Which FMA version should I use?.
The following options may be passed to the configure command line before building:
The following environment variables may be used to change the library behavior at runtime:
The following module parameters may be passed to the driver module when loading, either as a parameter to the modprobe command, or through the OMX_MODULE_PARAMS variable for the omx_init or /etc/init.d/open-mx startup script. Some of them may also be modified later by writing into /sys/module/open_mx/parameters/<parameter>.
If you plan to use Open-MX within a middleware such as a MPI layer, you should read the following configuration advices:
Open-MX provides several debugging features such as verbose messages, additional checks, non-optimized building, valgrind hooks, ... For performance reasons, they are not enabled by default.
By default, Open-MX will build a non-debug library and an optional debug library. The former is installed in $prefix/lib while the latter goes in $prefix/lib/debug. The driver is built without debug by default.
If you think you found a bug, see What if I find a bug?.
Passing --disable-debug to the configure command line will only disable the build of the debug library. Passing --enable-debug will make only the debug library be built and installed in $prefix/lib as usual, and the driver will be debug enabled.
The build flags may be configured by passing CFLAGS on the configure command line. Additional flags may be passed for the debugging library build with DBGCFLAGS.
Open-MX may abort the application under many circumstances. If you wish to attach a gdb to debug the process before it actually aborts, you may pass OMX_ABORT_SLEEPS=30 in the environment so that the actual abort is deferred by 30 seconds. The pid of the process will be displayed in the meantime. See also What happens on error?.
Yes. Open-MX maintains per-interface statistics at the driver level (even if debugging is disabled). They may be observed with
$ omx_counters
This defaults to the first interface. You may pass the -b option to select another interface. Only the non-null counters at displayed, unless -v is given. These counters may also be cleared with -c.
Open-MX also maintains statistics regarding local communication (shared-memory). They may be observed with
$ omx_counters -s
When the SIGUSR1 signal is sent to an Open-MX program, the library will dump its status on the standard output, including all known peers and pending requests.
This feature is enabled by default in the debug library only. It may be enabled at runtime by setting the OMX_DEBUG_SIGNAL environment variable to 1 or more (more means more status details will be displayed). This feature may also be disabled in the debug library if the variable is set to 0. If a numeric value is given in the OMX_DEBUG_SIGNAL_NUM environment variable, it will replace the default signal number (SIGUSR1).
The driver configuration depends on many static/dynamic configuration parameters (See Advanced Configuration). To dump this configuration, you may read from the device file:
$ cat /dev/open-mx Open-MX 0.9.2 (git-svn r2053) Driver ABI=0x151 Configured for 32 endpoints on 32 interfaces with 1024 peers [...]
This output may also be reported by the startup script:
$ omx_init status
Open-MX: FatalError: Failed to create user region 4, driver replied Bad address omx_misc.c:86: omx__ioctl_errno_to_return_checked: Assertion `0' failed.
This fatal error means that the application passed an invalid buffer to Open-MX. So the Open-MX driver failed to pin the buffer in physical memory when starting a large message.
It is very similar to a segmentation fault (an actual access to the buffer would have caused a fault). The application needs to be fixed, and returning an error would not help much, so Open-MX just aborts.
If you do not find your answer here, feel free to contact the open-mx mailing list.
Last updated on 2010/10/15.