Erlang peculiarities


While working on my WorkerNet post, I stumbled across a weird behaviour with start_links, trap_exit and slave nodes.

Long Story (sorry, there is no short one)

As I was setting up a distributed test with slaves, I also wanted one gen_server to trap_exit’s for the offsprings sake which I did not wish to be put under a supervisor (shame on me ;), suddenly – all of the tests stopped working! All of them where either timing out or reporting direct noprocs. Bewildered and wide eyed at 23:40 I gave it a go with the dbg tracer and even went through some of the gen_server source.

No answer.

I chalked it up to the rpc calls for the remote nodes, tried printing out the process numbers in each step. But no – it was a fact. My gen_servers died the instant they where created… Brooding over it, I tried some more but finally went to sleep. Up to then, I knew that the problem was caused by the following two snippets in combination with rpc calls to my local slave nodes

start_link() ->
    gen_server:start_link({local,?MODULE},?MODULE,[],[]).    

init([]) ->
    process_flag(trap_exit,true),
    {ok, ok}.

While the non trap_exit’d version worked like a charm. Not wanting to waste more time on it, I just circumvented it like a cheap rug on a very dark and very deep embarrassing hole in the floor with

start_link(succeed) ->
    {ok,Pid} = gen_server:start({local, ?MODULE}, ?MODULE, [], []),
    link(Pid),
    {ok,Pid}.
init([]) ->
    process_flag(trap_exit,true),
    {ok, ok}.

But I couldn’t leave it at just that. I had to seek help, and so I showed it to my senior colleague Nicolas, I had then devised a test which would reproduce this neatly. He cut it down a bit, and I boiled it to the broth you see here and can compile and run for yourself.

Just for the record: The seemingly expected behaviour would be to see the exit signals appear in the handle_info/2 – not causing the process to crash.

%%%-------------------------------------------------------------------
%%% @author Gianfranco <zenon@zen.local>
%%% @copyright (C) 2011, Gianfranco
%%% Created : 17 Jan 2011 by Gianfranco <zenon@zen.local>
%%%-------------------------------------------------------------------
-module(test).

%% API
-export([start_link/1]).
-export([test/1,init/1,handle_info/2,terminate/2]).

-spec(test(fail|succeed) -> term()).
test(Mode) ->
    io:format("Current 0 ~p~n",[self()]),
    spawn(fun() -> io:format("Current 1 ~p~n",[self()]),
                  {ok, _P} = ?MODULE:start_link(Mode)
          end).

start_link(fail) ->
    gen_server:start_link({local,?MODULE},?MODULE,[],[]);
start_link(succeed) ->
    {ok,Pid} = gen_server:start({local, ?MODULE}, ?MODULE, [], []),
    link(Pid),
    {ok,Pid}.    

init([]) ->
    process_flag(trap_exit,true),
    {ok, ok}.

handle_info(timeout,State) -> {stop,normal,State};
handle_info(_Info, State) ->
    io:format("info ~p~n",[_Info]),
    {noreply, State,5000}.

terminate(_Reason, _State) ->
    io:format("reason ~p~n",[_Reason]),
    ok.

Compiling and running we see the expected and unexpected, I chose to call it succeed and fail, based on that the process dies (fails) and succeeds (succeed) in trapping

zen:Downloads zenon$ erlc test.erl
zen:Downloads zenon$ erl
Erlang R14B (erts-5.8.1) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe]
[kernel-poll:false]

Eshell V5.8.1  (abort with ^G)
1> test:test(fail).
Current 0 <0.31.0>
Current 1 <0.33.0>
<0.33.0>
reason normal
2> test:test(succeed).
Current 0 <0.31.0>
Current 1 <0.36.0>
<0.36.0>
info {'EXIT',<0.36.0>,normal}
                             (5 seconds later)
reason normal
3>

As you see, the process did not die after initialization. It trapped the spawner’s end.  One possible explanation could be the one stated is in the module gen_server.erl (read the source Luke!)

%%% ---------------------------------------------------
%%%
%%% The idea behind THIS server is that the user module
%%% provides (different) functions to handle different
%%% kind of inputs.
%%% If the Parent process terminates the Module:terminate/2
%%% function is called.
%%%

Some more digging into this, Nicolas came with the idea of sys:get_status/1 ing the processes. What was revealed can be seen below! The parent of the gen_server:start/1-ed process is itself!

Sys:get_status(<0.37.0>) = {status,<0.37.0>,
                               {module,gen_server},
                               [[{'$ancestors',[<0.36.0>]},
                                 {'$initial_call',{test,init,1}}],
                                running,<0.37.0>,[],
                                [{header,"Status for generic server test"},
                                 {data,
                                     [{"Status",running},
                                      {"Parent",<0.37.0>},
                                      {"Logged events",[]}]},
                                 {data,[{"State",ok}]}]]}

/G

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: