- Created by Unknown User (nagc), last modified on Jan 26, 2015
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 11 Next »
There is an issue with OpenIFS when compiling with the Intel compiler at optimization level -O2 or above on chipsets that support SSE4.1 & AVX instructions.
Users will see failure with the T21 test job similar to the following:
signal_harakiri(SIGALRM=14): New handler installed at 0x432110; old preserved at 0x0 ***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time = 6.18 myproc#1,tid#1,pid#27600,signal#8(SIGFPE): Received signal :: 123MB (heap), 125MB (rss), 0MB (stack), 0 (paging), nsigs 1, time 6.18 tid#1 starting drhook traceback, time = 6.18 myproc#1,tid#1,pid#27600: MASTER myproc#1,tid#1,pid#27600: CNT0<1> myproc#1,tid#1,pid#27600: CNT1 myproc#1,tid#1,pid#27600: CNT2 myproc#1,tid#1,pid#27600: CNT3 myproc#1,tid#1,pid#27600: CNT4 myproc#1,tid#1,pid#27600: STEPO myproc#1,tid#1,pid#27600: SCAN2H myproc#1,tid#1,pid#27600: SCAN2M myproc#1,tid#1,pid#27600: GP_MODEL myproc#1,tid#1,pid#27600: EC_PHYS_DRV myproc#1,tid#1,pid#27600: >OMP-PHYSICS CLDPP T/S (1002) myproc#1,tid#1,pid#27600: EC_PHYS myproc#1,tid#1,pid#27600: CALLPAR myproc#1,tid#1,pid#27600: SLTEND
It arises because these compiler makes use of 2-way vectorization when compiling both branches of IF statements which can generate floating point exceptions if a zero divide is possible in the unexecuted branch.
There are several possible workarounds:
- Compile the routines that cause the problem with lower optimisation, -O1. The routines affected are: sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90.
- Run with the environment variable: DR_HOOK_IGNORE_SIGNALS=8 to disable trapping of floating point exception signals (SIGFPE) by the model. This is not ideal as it will not catch other causes of floating point exceptions.
Edit the code and insert the line:
!DEC$ OPTIMIZE:1
directly after the SUBROUTINE statement into the routines: sltend.F90, vsurf_mod.F90, vdfmain.F90, vdfhghtn.F90.
For more help with this issue, please contact openifs-support@ecmwf.int.
We are aware of a problem in grib_api when using the Intel compiler that seems to affect different versions of grib_api and causes the model to fail with a floating point exception (SIGFPE) in the routine PRESET_GRIB_TEMPLATE. The advice is to reduce the optimization level when compiling grib_api to -O1 rather than -O2.
The error message that typifies this problem is:
***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time = 3.10 JSETSIG: sl->active = 0 signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0 ***Received signal = 8 and ActivatED SIGALRM=14 and calling alarm(10), time = 3.10 [myproc#1,tid#1,pid#14063]: MASTER [myproc#1,tid#1,pid#14063]: CNT0<1> [myproc#1,tid#1,pid#14063]: SU0YOMB [myproc#1,tid#1,pid#14063]: SU_GRIB_API [myproc#1,tid#1,pid#14063]: PRESET_GRIB_TEMPLATE JSETSIG: sl->active = 0 signal_harakiri(SIGALRM=14): New handler installed at 0xabfa00; old preserved at 0x0 tail NODE.001_01 - Set up F-post processing, part 2---------------------------------- YDSL%CVER=FP YDSL%NASLB1= 1053 YDSL%NASLB1_TRUE= 79 *** YRFP%NASLB1 RESET TO NPROMA*NGPBLKS= 48 THE POST-PROCESSING RESOLUTION IS NEVER COARSER THAN THE MODEL RESOLUTION ARRAY SSEC2 ALLOCATED 132 132 SUBFPOS: case LFPDISTRIB=F NFPROMA=NFPROMA_DEP; NFPBLOCS=NFPBLOCS_DEP NFPSTART=NFPSTART_DEP; NFPEND=NFPEND_DEP NFPSORT=NFPSORT_DEP; NFPBLOFF=NFPBLOFF_DEP SUFPIOS PRINTS OUT NFPXFLD = -999 - Set up GRIB API usage---------------------------------- ABOR1 CALLED Dr.Hook calls ABOR1 ...
This is caused by the way IFS creates its own signal handler. To enable Cray ATP set:
export DR_HOOK_IGNORE_SIGNALS=-1
in the job script to completely disable any signal trapping by DrHook.
This issue has been fixed in OpenIFS releases 38r1v05 and beyond. For previous releases, either use the fix above or contact openifs-support@ecmwf.int for assistance.
This is a result of the way in which the OpenIFS is compiled. More information on this and the resolution is described here.
- No labels