afni receives SIGBUS after few seconds when using apptainer exec

Hi there. I am trying to install AFNI on behalf of a user of our HPC cluster. Apologies in advance because our setup has quite a few layers, and I'm guessing relies on a strong understanding of how afni interacts with an apptainer container.

In short, I have pulled the existing AFNI_24.1.22 Docker container using Apptainer instead of Docker. We offer remote desktops via noVNC. These desktops run on Rocky 9.2.

Here comes the weird part. If I run afni by first shelling into the container, then it works as expected. The windows pop up and persist. My commands and output are below:

[lexg@m3t106 blah]$ apptainer shell $AFNI_SIF
Apptainer> afni
Precompiled binary linux_ubuntu_16_64_glw_local_shared: Jun 21 2024 (Version AFNI_24.1.22 'Publius Septimius Geta')


** Version check disabled: AFNI_VERSION_CHECK forbids
Thanks go to A Clark for awe-inspiring caffeine binging

Initializing: X11[TigerVNC v 11400000].
++++++++ IMAGE SAVE SETUP WARNINGS ++++++++
++ Can't find program ffmpeg for Save to MPEG-1
++ To disable these warnings, set environment
++  variable AFNI_IMSAVE_WARNINGS to 'NO'.
+++++++++++++++++++++++++++++++++++++++++++++
. Widgets.
++ WARNING: Can't find default atlas (Brodmann_Pijn_AFNI) dataset for 'whereami_afni'!
++--------- See https://afni.nimh.nih.gov/pub/dist/data/
..... Input files:
** Searching subdirectories of './' for data
** No datasets found -- making up something **
 Timeseries.1D = 0 files read
 .[tc]sv data  = 0 files read
Path(s) to be searched for plugins: 
/opt/afni/install /opt/afni/install/../lib 

 Plugins       = 49 libraries read
 write compress= GZIP
++ AFNI is detached from terminal.
Apptainer> 
++ NOTE: This version of AFNI was built Jun 21 2024 ++
++ NOTE: 'Define Markers' is hidden: right-click 'DataDir' to see it
++ NOTE: Use '-seehidden' option to see which plugins are hidden


------------------------- AFNI Startup Tip (44/112)----------------------------
Left-click in the square right of 'Etc->' in an AFNI controller will
 popup a copy of the splash screen again. Another left-click there will
 pop the splash window down again. Clicking in the reincarnated splash screen
 may give funny results.
Right-click in that square will give a menu with some fun choices.
Middle-click in that square will popup a random insult.
-------------------------------------------------------------------------------

However, if I then run afni directly by using apptainer exec instead of apptainer shell, the windows pop up for a few seconds, but then the program crashes:

[lexg@m3t106 blah]$ apptainer exec $AFNI_SIF afni
Precompiled binary linux_ubuntu_16_64_glw_local_shared: Jun 21 2024 (Version AFNI_24.1.22 'Publius Septimius Geta')


** Version check disabled: AFNI_VERSION_CHECK forbids
Thanks go to EA DeYoe for many suggestions

Initializing: X11[TigerVNC v 11400000].
++++++++ IMAGE SAVE SETUP WARNINGS ++++++++
++ Can't find program ffmpeg for Save to MPEG-1
++ To disable these warnings, set environment
++  variable AFNI_IMSAVE_WARNINGS to 'NO'.
+++++++++++++++++++++++++++++++++++++++++++++
. Widgets.
++ WARNING: Can't find default atlas (Brodmann_Pijn_AFNI) dataset for 'whereami_afni'!
++--------- See https://afni.nimh.nih.gov/pub/dist/data/
..... Input files:
** Searching subdirectories of './' for data
** No datasets found -- making up something **
 Timeseries.1D = 0 files read
 .[tc]sv data  = 0 files read
Path(s) to be searched for plugins: 
/opt/afni/install /opt/afni/install/../lib 

 Plugins       = 49 libraries read
 write compress= GZIP
++ AFNI is detached from terminal.
[lexg@m3t106 blah]$ 
++ NOTE: This version of AFNI was built Jun 21 2024 ++
++ NOTE: 'Define Markers' is hidden: right-click 'DataDir' to see it
++ NOTE: Use '-seehidden' option to see which plugins are hidden


------------------------- AFNI Startup Tip (80/112)----------------------------

Fatal Signal 7 (SIGBUS) received
Last STATUS: splashed down
   AFNI_startup_timeout_CB
  AFNI:main
 Bottom of Debug Stack
** AFNI version = AFNI_24.1.22  Compile date = Jun 21 2024
** [[Precompiled binary linux_ubuntu_16_64_glw_local_shared: Jun 21 2024]]
** Program Death **
** If you report this crash to the AFNI message board,
** please copy the error messages EXACTLY, and give
** the command line you used to run the program, and
** any other information needed to repeat the problem.
** You may later be asked to upload data to help debug.
** Crash log is appended to file /home/lexg/.afni.crashlog

The crash log is:

*********------ CRASH LOG ------------------------------***********
Fatal Signal 7 (SIGBUS) received
.......... recent internal history .........................................
----MCW_popup_message [4]: EXIT} (file=xutil.c line=644) to AFNI_startup_timeout_CB {2688 ms}
++++AFNI_vedit_CB [4]: {ENTRY (file=afni_widg.c line=7702) from AFNI_startup_timeout_CB {4552 ms}
+++++AFNI_misc_CB [5]: {ENTRY (file=afni_func.c line=7265) from AFNI_vedit_CB {4552 ms}
++++++AFNI_controller_index [6]: {ENTRY (file=afni_widg.c line=6333) from AFNI_misc_CB {4552 ms}
------AFNI_controller_index [6]: EXIT} (file=afni_widg.c line=6338) to AFNI_misc_CB {4552 ms}
++++++AFNI_controller_index [6]: {ENTRY (file=afni_widg.c line=6333) from AFNI_misc_CB {4552 ms}
------AFNI_controller_index [6]: EXIT} (file=afni_widg.c line=6338) to AFNI_misc_CB {4552 ms}
-----AFNI_misc_CB [5]: EXIT} (file=afni_func.c line=7707) to AFNI_vedit_CB {4552 ms}
----AFNI_vedit_CB [4]: EXIT} (file=afni_widg.c line=7772) to AFNI_startup_timeout_CB {4552 ms}
++++AFNI_set_cursor [4]: {ENTRY (file=afni.c line=13478) from AFNI_startup_timeout_CB {4753 ms}
----AFNI_set_cursor [4]: EXIT} (file=afni.c line=13554) to AFNI_startup_timeout_CB {4758 ms}
++++AFNI_quit_CB [4]: {ENTRY (file=afni.c line=3322) from AFNI_startup_timeout_CB {4758 ms}
----AFNI_quit_CB [4]: EXIT} (file=afni.c line=3337) to AFNI_startup_timeout_CB {4758 ms}
++++AFNI_controller_index [4]: {ENTRY (file=afni_widg.c line=6333) from AFNI_startup_timeout_CB {4758 ms}
----AFNI_controller_index [4]: EXIT} (file=afni_widg.c line=6338) to AFNI_startup_timeout_CB {4758 ms}
++++AFNI_coord_filer_setup [4]: {ENTRY (file=afni_filer.c line=49) from AFNI_startup_timeout_CB {4758 ms}
+++++AFNI_controller_index [5]: {ENTRY (file=afni_widg.c line=6333) from AFNI_coord_filer_setup {4758 ms}
-----AFNI_controller_index [5]: EXIT} (file=afni_widg.c line=6338) to AFNI_coord_filer_setup {4758 ms}
----AFNI_coord_filer_setup [4]: EXIT} (file=afni_filer.c line=57) to AFNI_startup_timeout_CB {4758 ms}
++++AFNI_splashdown [4]: {ENTRY (file=afni_splash.c line=95) from AFNI_startup_timeout_CB {4758 ms}
+++++SPLASH_popup_image [5]: {ENTRY (file=afni_splash.c line=539) from AFNI_splashdown {4758 ms}
     SPLASH_popup_image -- unrealizing splash window {4758 ms}
++++++drive_MCW_imseq [6]: {ENTRY (file=imseq.c line=8144) from SPLASH_popup_image {4758 ms}
+++++++ISQ_timer_stop [7]: {ENTRY (file=imseq.c line=13366) from drive_MCW_imseq {4758 ms}
-------ISQ_timer_stop [7]: EXIT} (file=imseq.c line=13370) to drive_MCW_imseq {4758 ms}
------drive_MCW_imseq [6]: EXIT} (file=imseq.c line=9068) to SPLASH_popup_image {4758 ms}
-----SPLASH_popup_image [5]: EXIT} (file=afni_splash.c line=550) to AFNI_splashdown {4758 ms}
+++++mri_free [5]: {ENTRY (file=mri_free.c line=49) from AFNI_splashdown {4758 ms}
-----mri_free [5]: EXIT} (file=mri_free.c line=50) to AFNI_splashdown {4758 ms}
----AFNI_splashdown [4]: EXIT} (file=afni_splash.c line=121) to AFNI_startup_timeout_CB {4758 ms}
   AFNI_startup_timeout_CB -- splashed down {4758 ms}
............................................................................
Last STATUS: splashed down
   AFNI_startup_timeout_CB
  AFNI:main
** AFNI compile date = Jun 21 2024
** [[Precompiled binary linux_ubuntu_16_64_glw_local_shared: Jun 21 2024]]
** Program Crash **

Here is the output of a system check:

[lexg@m3t106 blah]$ apptainer exec $AFNI_SIF afni_system_check.py -check_all
-------------------------------- general ---------------------------------
architecture:         64bit ELF
cpu type:             x86_64
system:               Linux
release:              5.14.0-284.25.1.el9_2.x86_64
version:              #1 SMP PREEMPT_DYNAMIC Wed Aug 2 14:53:30 UTC 2023
distribution:         Ubuntu 18.04 bionic
number of CPUs:       4
apparent login shell: bash
shell RC file:        .bashrc (exists)

--------------------- AFNI and related program tests ---------------------
which afni           :
                     : AFNI_24.1.22 'Publius Septimius Geta'
AFNI_version.txt     : AFNI_24.1.22, linux_ubuntu_16_64_glw_local_shared, Jun 21 2024, local
which python         :
which R              :

instances of various programs found in PATH:
    afni    : 1   (/opt/afni/install/afni)
    R       : 1   (/usr/bin/R)
    python  : 1   (/usr/bin/python3.6)
    python2 : 0 
    python3 : 1   (/usr/bin/python3.6)

** have python3 but not python2

testing ability to start various programs...
    afni                 : success
    suma                 : success
    3dSkullStrip         : success
    3dAllineate          : success
    3dRSFC               : success
    SurfMesh             : success
    3dClustSim           : success
    uber_subject.py      : success
    3dMVM                : FAILURE
        Error in library(data.table) : there is no package called \u2018data.table\u2019
        Calls: source ... suppressPackageStartupMessages -> withCallingHandlers -> library
        Execution halted

have failures, testing programs under implied /opt/afni/install...
    afni                 : success
    suma                 : success
    3dSkullStrip         : success
    3dAllineate          : success
    3dRSFC               : success
    SurfMesh             : success
    3dClustSim           : success
    uber_subject.py      : success
    3dMVM                : FAILURE
        Error in library(data.table) : there is no package called \u2018data.table\u2019
        Calls: source ... suppressPackageStartupMessages -> withCallingHandlers -> library
        Execution halted

------------------------ dependent program tests -------------------------
checking for dependent programs...

which tcsh           :
which Xvfb           :

checking for R packages...
    rPkgsInstall -pkgs ALL -check : FAILURE
        
        oo Warning: 
           These packages are not installed on the computer: afex!
        These packages are not installed on the computer: phia!
        These packages are not installed on the computer: snow!
        These packages are not installed on the computer: lme4!
        These packages are not installed on the computer: lmerTest!
        These packages are not installed on the computer: gamm4!
        These packages are not installed on the computer: data.table!
        These packages are not installed on the computer: paran!
        These packages are not installed on the computer: psych!
        These packages are not installed on the computer: brms!
        These packages are not installed on the computer: corrplot!
        These packages are not installed on the computer: metafor!
        

R RHOME : /usr/lib/R

------------------------------ python libs -------------------------------

++ module loaded: matplotlib.pyplot
   module file : /opt/user_pip_packages/lib/python3.6/site-packages/matplotlib/pyplot.py
   matplotlib version : 3.3.4

** failed to load module flask
-- flask is not required, but is desirable

** failed to load module flask_cors
-- flask_cors is not required, but is desirable

-------------------------------- env vars --------------------------------
PATH                       = /opt/afni/src/../install:/opt/cmake/cmake-3.14.7-Linux-x86_64/bin:/opt/user_pip_packages/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

PYTHONPATH                 = /usr/local/strudel2/submit
R_LIBS                     = 
LD_LIBRARY_PATH            = /.singularity.d/libs
DYLD_LIBRARY_PATH          = 
DYLD_FALLBACK_LIBRARY_PATH = 
CONDA_SHLVL                = 
CONDA_DEFAULT_ENV          = 

----------------------------- eval dot files -----------------------------

----------- AFNI $HOME files -----------

    .afnirc                   : found
    .sumarc                   : missing
    .afni/help/all_progs.COMP : missing

--------- shell startup files ----------

   ** error: have no found abin, so please use -dir_bin
------------------------------ data checks -------------------------------
data dir : missing AFNI_data6
data dir : missing AFNI_demos
data dir : missing suma_demo
data dir : missing afni_handouts
atlas    : did not find TT_N27+tlrc

------------------------------ OS specific -------------------------------

have Ubuntu system: Ubuntu 18.04 bionic

=========================  summary, please fix:  =========================
*  just be aware: login shell 'bash', but our code examples use 'tcsh'
*  missing program: afni
*  missing program: python
*  missing program: R
*  failure under initial "AFNI and related program tests"
*  AFNI programs show FAILURE
*  consider adding /opt/afni/install to your PATH
*  missing program: tcsh
*  missing program: Xvfb
*  missing R packages (see rPkgsInstall)
*  please run: "suma -update_env" for .sumarc
*  please run: apsearch -update_all_afni_help
*  failure running init_user_dotfiles.py -test
*  insufficient data for AFNI bootcamp
   (see "Prepare for Bootcamp" on install pages)
*  possibly missing atlases

There is zero difference between running this check with exec vs shell.

I should note that the user has reported that the non-GUI executables work just fine via apptainer exec. I compared printenv | sort when running with shell vs exec and saw no significant differences,

[lexg@m3-login3 blah]$ diff *.txt
15c15
< APPTAINER_COMMAND=exec
---
> APPTAINER_COMMAND=shell
43d42
<  eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot $@
44a44
>  eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot $@
100d99
< PROMPT_COMMAND=PS1="Apptainer> "; unset PROMPT_COMMAND
118c117
< SHLVL=3
---
> SHLVL=5
170c169
< _=/usr/bin/apptainer
---
> _=/usr/bin/printenv

and I don't believe the filesystem bindings should differ between the two.

I also set export AFNI_TRACE=Y and here is the tail of the output before it crashes:

++++++get_session_dset [6]: {ENTRY (file=thd_warp_tables.c line=48) from set_session_dset {4051 ms}
------get_session_dset [6]: EXIT} (file=thd_warp_tables.c line=67) to set_session_dset {4051 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=161) to AFNI_read_inputs {4051 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {4051 ms}
++++++get_session_dset [6]: {ENTRY (file=thd_warp_tables.c line=48) from set_session_dset {4051 ms}
------get_session_dset [6]: EXIT} (file=thd_warp_tables.c line=67) to set_session_dset {4051 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=161) to AFNI_read_inputs {4051 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {4051 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=141) to AFNI_read_inputs {4051 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {4051 ms}
++++++get_session_dset [6]: {ENTRY (file=thd_warp_tables.c line=48) from set_session_dset {4051 ms}
------get_session_dset [6]: EXIT} (file=thd_warp_tables.c line=67) to set_session_dset {4051 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=161) to AFNI_read_inputs {4051 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {4051 ms}
++++++get_session_dset [6]: {ENTRY (file=thd_warp_tables.c line=48) from set_session_dset {4051 ms}
------get_session_dset [6]: EXIT} (file=thd_warp_tables.c line=67) to set_session_dset {4051 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=161) to AFNI_read_inputs {4051 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {4051 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=141) to AFNI_read_inputs {4051 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {4051 ms}
++++++get_session_dset [6]: {ENTRY (file=thd_warp_tables.c line=48) from set_session_dset {4051 ms}
------get_session_dset [6]: EXIT} (file=thd_warp_tables.c line=67) to set_session_dset {4051 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=161) to AFNI_read_inputs {4051 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {4051 ms}
++++++get_session_dset [6]: {ENTRY (file=thd_warp_tables.c line=48) from set_session_dset {4051 ms}
------get_session_dset [6]: EXIT} (file=thd_warp_tables.c line=67) to set_session_dset {4051 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=161) to AFNI_read_inputs {4051 ms}

Fatal Signal 7 (SIGBUS) received
    AFNI_read_inputs
   MAIN_workprocess
  AFNI:main
 Bottom of Debug Stack
** AFNI version = AFNI_24.1.22  Compile date = Jun 21 2024
** [[Precompiled binary linux_ubuntu_16_64_glw_local_shared: Jun 21 2024]]
** Program Death **
** If you report this crash to the AFNI message board,
** please copy the error messages EXACTLY, and give
** the command line you used to run the program, and
** any other information needed to repeat the problem.
** You may later be asked to upload data to help debug.
** Crash log is appended to file /home/lexg/.afni.crashlog

Obviously when running with shell, it gets much further and I instead see

-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=161) to AFNI_read_inputs {3932 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {3932 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=141) to AFNI_read_inputs {3932 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {3932 ms}
++++++get_session_dset [6]: {ENTRY (file=thd_warp_tables.c line=48) from set_session_dset {3932 ms}
------get_session_dset [6]: EXIT} (file=thd_warp_tables.c line=67) to set_session_dset {3932 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=161) to AFNI_read_inputs {3932 ms}
+++++set_session_dset [5]: {ENTRY (file=thd_warp_tables.c line=81) from AFNI_read_inputs {3932 ms}
++++++get_session_dset [6]: {ENTRY (file=thd_warp_tables.c line=48) from set_session_dset {3932 ms}
------get_session_dset [6]: EXIT} (file=thd_warp_tables.c line=67) to set_session_dset {3932 ms}
-----set_session_dset [5]: EXIT} (file=thd_warp_tables.c line=161) to AFNI_read_inputs {3932 ms}
+++++THD_dummy_N27 [5]: {ENTRY (file=thd_dumdset.c line=5858) from AFNI_read_inputs {3932 ms}
++++++EDIT_empty_copy [6]: {ENTRY (file=edt_emptycopy.c line=29) from THD_dummy_N27 {3932 ms}
+++++++THD_init_diskptr_names [7]: {ENTRY (file=thd_initdkptr.c line=27) from EDIT_empty_copy {3932 ms}
-------THD_init_diskptr_names [7]: EXIT} (file=thd_initdkptr.c line=148) to EDIT_empty_copy {3932 ms}
+++++++THD_init_datablock_brick [7]: {ENTRY (file=thd_initdblk.c line=1206) from EDIT_empty_copy {3932 ms}
       THD_init_datablock_brick -- making dblk->brick_bytes {3932 ms}
       THD_init_datablock_brick -- making dblk->brick_fac {3932 ms}
       THD_init_datablock_brick -- making new dblk->brick {3932 ms}
       THD_init_datablock_brick -- starting sub-brick creations {3932 ms}
++++++++mri_new_7D_generic [8]: {ENTRY (file=mri_new.c line=48) from THD_init_datablock_brick {3932 ms}
        mri_new_7D_generic -- nx=2 ny=2 nz=2 kind=1 bytes=16 (null) {3932 ms}
--------mri_new_7D_generic [8]: EXIT} (file=mri_new.c line=143) to THD_init_datablock_brick {3932 ms}
       THD_init_datablock_brick -- exiting {3932 ms}
-------THD_init_datablock_brick [7]: EXIT} (file=thd_initdblk.c line=1268) to EDIT_empty_copy {3932 ms}
------EDIT_empty_copy [6]: EXIT} (file=edt_emptycopy.c line=227) to THD_dummy_N27 {3932 ms}

I'm really at a loss for what's going wrong here or how to resolve it. Some ideas I've had are:

  1. I see that afni is launching TigerVNC inside the container. I don't know how this interacts with noVNC outside the container.
  2. I see that afni "detaches from the terminal". I wondered if this could explain the difference between shell vs exec, since detaching in shell still leaves me inside the container shell, whereas detaching in exec may be taking me outside the container. I could not find any environment variables to disable this detachment though, and it may be irrelevant anyway.
  3. The container still seems to be missing lots of dependencies, mainly R libraries, but this doesn't seem to explain the discrepancy between shell and exec.

Apologies for the barrage of information, just trying to be thorough. Would very much appreciate anyone's input on this!

Set an environment variable AFNI_ATLAS_PATH to point to the location of the atlas and template datasets. It seems you are bypassing that with that method. You can also put that into the .afnirc file or into a shell environment.

I was right! I'll leave my initial response for the record but I was able to fix this issue by setting export AFNI_DETACH=NO.

Thanks for the quick response.

I tried setting AFNI_ATLAS_PATH to the path to afni_atlases_dist (which I downloaded from 11.3. Other templates and atlases โ€” AFNI, SUMA and FATCAT: v24.1.22), and this improves the system check slightly to include

atlas    : found TT_N27+tlrc  under /afni/afni_atlases_dist

(note that is the filepath inside the container). However this does not change afni from crashing when run via apptainer exec, whereas it still runs fine with apptainer shell. I still suspect it's related to afni detaching from the terminal and thus "forgetting" about the container's filesystem.

To test this, I watched ps aux every 0.1 seconds when running afni via apptainer shell vs apptainer exec and might have caught what I was suspecting? When running with apptainer exec, it starts off with something like

670 lexg     3664824  0.0  0.0   7256  3568 pts/72   S+   11:52   0:00 /bin/bash /apps/afni/24.1.22/bin/afni
671 lexg     3664834  2.6  0.0 1253356 25684 pts/72  Sl+  11:52   0:00 Apptainer runtime parent
672 lexg     3664861  1.3  0.0  85200  4892 pts/72   S+   11:52   0:00 /opt/afni/install/afni
673 lexg     3664876 30.3  0.0 377236 14808 pts/72   Sl+  11:52   0:00 /usr/libexec/apptainer/bin/squashfuse_ll -f -o uid=15673,gid=10025,offset=40960 /proc/self/fd/3 /var/apptainer/mnt/session/rootfs
674 lexg     3664881  9.0  0.0   6708  4696 pts/72   S+   11:52   0:00 /usr/bin/fuse-overlayfs -f -o lowerdir=/var/apptainer/mnt/session/overlay-lowerdir:/var/apptainer/mnt/session/rootfs,noacl /var/a
674 pptainer/mnt/session/final
675 lexg     3664919  4.5  0.0  92160  8948 pts/72   S+   11:52   0:00 /opt/afni/install/afni

Note the /usr/bin/fuse-overlayfs is running concurrently with afni. But just before it crashes, I see only

720 lexg     3664824  0.0  0.0   7256  3568 pts/72   S+   11:52   0:00 /bin/bash /apps/afni/24.1.22/bin/afni
721 lexg     3664834  0.0  0.0  20892  5484 pts/72   S+   11:52   0:00 Apptainer runtime parent
722 lexg     3664850  0.0  0.0  19184  3704 pts/74   R+   11:52   0:00 ps aux 
723 lexg     3664851  0.0  0.0   6412  2256 pts/74   S+   11:52   0:00 grep lexg
724 lexg     3664853  0.0  0.0 1121700 20212 pts/72  Sl+  11:52   0:00 Apptainer runtime parent

without /usr/bin/fuse-overlayfs. Comparing when running with apptainer shell, the very last ps aux before I quit the GUI after having it open for ~10 seconds is simply

lexg     3665751  1.0  0.0 1253100 24984 pts/72  Sl   11:55   0:00 Apptainer runtime parent
lexg     3665777  0.4  0.0  20588  4116 pts/72   S+   11:55   0:00 /bin/bash --norc
lexg     3665790 12.0  0.0 302340 17520 pts/72   Sl   11:55   0:01 /usr/libexec/apptainer/bin/squashfuse_ll -f -o uid=15673,gid=10025,offset=40960 /proc/self/fd/3 /var/apptainer/mnt/session/rootfs
lexg     3665795  3.5  0.0   7048  4672 pts/72   S    11:55   0:00 /usr/bin/fuse-overlayfs -f -o lowerdir=/var/apptainer/mnt/session/overlay-lowerdir:/var/apptainer/mnt/session/rootfs,noacl /var/apptainer/mnt/session/final
lexg     3665807  0.7  0.0   7256  3908 pts/74   S+   11:55   0:00 /bin/bash ./watch-procs.sh output-shell
lexg     3665853  4.6  0.0 104828 26356 pts/72   S    11:55   0:00 afni

so the /usr/bin/fuse-overlayfs was still running to the very end. I understand that if this really is the issue, then this is probably more of a question for apptainer experts at this point...

But nevermind I have found a solution! Originally I only saw a list of AFNI environment variables at 3.5. List of all AFNI environment variables โ€” AFNI, SUMA and FATCAT: v22.3.03. Neither of those mentions detaching. But I searched the afni source code for "detach" and found afni/src/afni_startup_tips.h at bbcac392ad05617a409ed41505ea8a7efc924f58 ยท afni/afni ยท GitHub, which revealed there was an environment variable to control whether afni detaches from the terminal or not called AFNI_DETACH. Setting AFNI_DETACH=NO resolved my issue! I then saw that this was indeed documented at 3.6. List of all startup tips โ€” AFNI, SUMA and FATCAT: v22.3.03.

I think the "5.3. List of all AFNI environment variables" page should be updated to include AFNI_DETACH, otherwise the "all" is a bit misleading.

Thanks anyway for your help dglen!

1 Like

Great job finding that problem! The fork function is used for the detaching, but that doesn't get called with AFNI_DETACH set to NO or with the -no_detach option for atni. This detaching feature was added a little over 10 years ago, and it almost always works out without requiring users to run the program in the background.