Shut down application with worker in unusable state

What is the correct way to shut down an application via the ACI once a worker has entered the UNUSABLE state?

I have written tests in order to verify that a component with invalid properties will cause the start method to return RCC_FATAL with the correct error error message. There are about 10 tests that each verify that a single property being set incorrectly will cause this behaviour.
The actual tests work fine, I just set the log level to 0 to suppress the output. However, once they are complete, for each test openCPI is outputting
***Exception during shutdown: Control operation "release" failed in state "UNUSABLE": worker "rx_periodic_na_xs" in container rcc0 from artifact worker rx_periodic_na_xsrcc***
followed by a long stack trace.

This is pretty inconvenient having to scroll past 200+ lines of useless output just to see the output from the tests.

Alternatively, is there a way to disable this behaviour of printing the stack trace?

Many thanks,
Dan

To clarify, you are creating an OCPI::API::Application that you know will RCC_FATAL during its start() method?

Could you be a bit more specific with how you are currently shutting it down?

Yes that’s correct.
An application is created for each test case and is shut down by calling app.finish() followed by the destructor. Without app.finish() the result is the same.

I’m fairly convinced this is a bug.

As far as I was aware, if any worker start() method returns an RCC_FATAL, the application should immediately die. It should not ever get to the release() call, where it then stack traces because the app shouldn’t have ever been allowed to run in the first place.

I made an MCVE for this:

Can you check that the start_causes_fatal.rcc worker is doing basically the same thing you are?

I then made an app with only that worker, and ran it with ocpirun and I got a stack trace on release():

OCPI( 8:290.0528): Error Exception: Control operation "release" failed in state "UNUSABLE": worker "start_causes_fatal" in container rcc0 from artifact worker start_causes_fatalrcc
OCPI( 2:290.0528): Exception during application shutdown: Control operation "release" failed in state "UNUSABLE": worker "start_causes_fatal" in container rcc0 from artifact worker start_causes_fatalrcc
Exiting for exception: Code 0x18, level 1, error: 'Worker "start_causes_fatal" produced an error during the "start" control operation: returned RCC_FATAL'
OCPI( 8:290.0528): HDL Simulator Shutdown.  Current pid: 0

***Exception during shutdown: Control operation "release" failed in state "UNUSABLE": worker "start_causes_fatal" in container rcc0 from artifact worker start_causes_fatalrcc***
ocpirun(_ZN4OCPI2OS9dumpStackEv+0x23) [0x5599a123a8a8]
ocpirun(_ZN4OCPI4Util5Error10setFormatVEPKcP13__va_list_tag+0xd1) [0x5599a121c78f]
ocpirun(_ZN4OCPI4Util5ErrorC1EPKcz+0xe1) [0x5599a121c3b5]
ocpirun(_ZN4OCPI9Container6Worker9controlOpENS_8Metadata6Worker16ControlOperationE+0x564) [0x5599a114a76e]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI3RCC6WorkerD1Ev+0x115) [0x7f1a2346b547]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI3RCC6WorkerD0Ev+0x1c) [0x7f1a2346b87c]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI4Util6ParentINS_3RCC6WorkerEE14deleteChildrenEv+0x5d) [0x7f1a2347a459]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI3RCC11ApplicationD1Ev+0xa6) [0x7f1a23479a92]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI3RCC11ApplicationD0Ev+0x1c) [0x7f1a23479b00]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI4Util6ParentINS_3RCC11ApplicationEE14deleteChildrenEv+0x5d) [0x7f1a23469697]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI3RCC9ContainerD1Ev+0xf0) [0x7f1a234684c6]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI3RCC9ContainerD0Ev+0x1c) [0x7f1a23468588]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI4Util6ParentINS_3RCC9ContainerEE14deleteChildrenEv+0x5d) [0x7f1a23467383]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI3RCC6DriverD2Ev+0x6d) [0x7f1a23466b6f]
/home/opencpi/git/opencpi_clones/develop_26_06_23/exports/ubuntu22_04/lib/libocpi_rcc_s.so(_ZN4OCPI3RCC6DriverD0Ev+0x1c) [0x7f1a23466c14]
ocpirun(_ZN4OCPI4Util6ParentINS_9Container6DriverEE14deleteChildrenEv+0x5d) [0x5599a1158c55]
ocpirun(_ZN4OCPI9Container7ManagerD1Ev+0x61) [0x5599a11571b1]
ocpirun(_ZN4OCPI9Container7ManagerD0Ev+0x1c) [0x5599a1157264]
ocpirun(_ZN4OCPI4Base6Plugin14ManagerManager7cleanupEv+0xc5) [0x5599a11fd1c1]
ocpirun(_ZN4OCPI4Base6Plugin7cleanupD1Ev+0x15) [0x5599a11fdd57]
/lib/x86_64-linux-gnu/libc.so.6(+0x45495) [0x7f1a22e45495]
/lib/x86_64-linux-gnu/libc.so.6(on_exit+0) [0x7f1a22e45610]
ocpirun(_ZN4OCPI4Base18BaseCommandOptions3badEPKcz+0) [0x5599a1206a5e]
ocpirun(_ZN4OCPI4Base18BaseCommandOptions4mainEPPKcPFiS4_E+0xb1) [0x5599a1206925]
ocpirun(main+0x30) [0x5599a10feb58]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f1a22e29d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f1a22e29e40]
ocpirun(_start+0x25) [0x5599a10fd6f5]
OCPI( 8:290.0532): Error Exception: Control operation "release" failed in state "UNUSABLE": worker "start_causes_fatal" in container rcc0 from artifact worker start_causes_fatalrcc

Yes this looks very similar to what I was getting.

So I did some digging on this.

It seems that when the Worker destructor ~Worker is called, it then invokes the release() method. Which then stack traces as the worker was made UNUSABLE by start().

The destructor doesn’t check the state before doing these invocations, which it almost certainly should.

The following patch avoids the stack trace (just wraps the destructor control ops in a state check):

diff --git a/runtime/rcc/src/RccWorker.cc b/runtime/rcc/src/RccWorker.cc
index 9f31df23b..20f5be896 100644
--- a/runtime/rcc/src/RccWorker.cc
+++ b/runtime/rcc/src/RccWorker.cc
@@ -114,11 +114,13 @@ Worker::
 {
   // FIXME - this sort of thing should be generic and be reused in portError
   try {
-    if (enabled) {
-      enabled = false;
-      controlOp(OM::Worker::OpStop); // call base class that filters the op
+    if (this->getState() != OM::Worker::UNUSABLE) {
+      if (enabled) {
+        enabled = false;
+        controlOp(OM::Worker::OpStop); // call base class that filters the op
+      }
+      controlOp(OM::Worker::OpRelease); // call base class that filters the op
     }
-    controlOp(OM::Worker::OpRelease); // call base class that filters the op
   } catch(...) {
   }
 #ifdef EM_PORT_COMPLETE

This results in the following application closing messages:

OCPI( 8:511.0519): Error Exception: Control operation "release" failed in state "UNUSABLE": worker "start_causes_fatal" in container rcc0 from artifact worker start_causes_fatalrcc
OCPI( 2:511.0519): Exception during application shutdown: Control operation "release" failed in state "UNUSABLE": worker "start_causes_fatal" in container rcc0 from artifact worker start_causes_fatalrcc
Exiting for exception: Code 0x18, level 1, error: 'Worker "start_causes_fatal" produced an error during the "start" control operation: returned RCC_FATAL'
OCPI( 8:511.0519): HDL Simulator Shutdown.  Current pid: 0

The app then exits back to shell with return code 1. The message referring to release() is slightly confusing here, but that’s because the framework wants to call release() to shutdown properly, but can’t anymore. You’ll notice that the penultimate line correctly identifies that start() is the issue.

I’ll make a note to create a bug report on this with this patch as my suggested fix.

1 Like

Reported with above patch as the recommended fix: