Erlang peculiarities

While working on my WorkerNet post, I stumbled across a weird behaviour with start_links, trap_exit and slave nodes.

Long Story (sorry, there is no short one)

As I was setting up a distributed test with slaves, I also wanted one gen_server to trap_exit’s for the offsprings sake which I did not wish to be put under a supervisor (shame on me ;), suddenly – all of the tests stopped working! All of them where either timing out or reporting direct noprocs. Bewildered and wide eyed at 23:40 I gave it a go with the dbg tracer and even went through some of the gen_server source.

No answer.

I chalked it up to the rpc calls for the remote nodes, tried printing out the process numbers in each step. But no – it was a fact. My gen_servers died the instant they where created… Brooding over it, I tried some more but finally went to sleep. Up to then, I knew that the problem was caused by the following two snippets in combination with rpc calls to my local slave nodes

start_link() ->

init([]) ->
    {ok, ok}.

While the non trap_exit’d version worked like a charm. Not wanting to waste more time on it, I just circumvented it like a cheap rug on a very dark and very deep embarrassing hole in the floor with

start_link(succeed) ->
    {ok,Pid} = gen_server:start({local, ?MODULE}, ?MODULE, [], []),
init([]) ->
    {ok, ok}.

But I couldn’t leave it at just that. I had to seek help, and so I showed it to my senior colleague Nicolas, I had then devised a test which would reproduce this neatly. He cut it down a bit, and I boiled it to the broth you see here and can compile and run for yourself.

Just for the record: The seemingly expected behaviour would be to see the exit signals appear in the handle_info/2 – not causing the process to crash.

%%% @author Gianfranco <zenon@zen.local>
%%% @copyright (C) 2011, Gianfranco
%%% Created : 17 Jan 2011 by Gianfranco <zenon@zen.local>

%% API

-spec(test(fail|succeed) -> term()).
test(Mode) ->
    io:format("Current 0 ~p~n",[self()]),
    spawn(fun() -> io:format("Current 1 ~p~n",[self()]),
                  {ok, _P} = ?MODULE:start_link(Mode)

start_link(fail) ->
start_link(succeed) ->
    {ok,Pid} = gen_server:start({local, ?MODULE}, ?MODULE, [], []),

init([]) ->
    {ok, ok}.

handle_info(timeout,State) -> {stop,normal,State};
handle_info(_Info, State) ->
    io:format("info ~p~n",[_Info]),
    {noreply, State,5000}.

terminate(_Reason, _State) ->
    io:format("reason ~p~n",[_Reason]),

Compiling and running we see the expected and unexpected, I chose to call it succeed and fail, based on that the process dies (fails) and succeeds (succeed) in trapping

zen:Downloads zenon$ erlc test.erl
zen:Downloads zenon$ erl
Erlang R14B (erts-5.8.1) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe]

Eshell V5.8.1  (abort with ^G)
1> test:test(fail).
Current 0 <0.31.0>
Current 1 <0.33.0>
reason normal
2> test:test(succeed).
Current 0 <0.31.0>
Current 1 <0.36.0>
info {'EXIT',<0.36.0>,normal}
                             (5 seconds later)
reason normal

As you see, the process did not die after initialization. It trapped the spawner’s end.  One possible explanation could be the one stated is in the module gen_server.erl (read the source Luke!)

%%% ---------------------------------------------------
%%% The idea behind THIS server is that the user module
%%% provides (different) functions to handle different
%%% kind of inputs.
%%% If the Parent process terminates the Module:terminate/2
%%% function is called.

Some more digging into this, Nicolas came with the idea of sys:get_status/1 ing the processes. What was revealed can be seen below! The parent of the gen_server:start/1-ed process is itself!

Sys:get_status(<0.37.0>) = {status,<0.37.0>,
                                [{header,"Status for generic server test"},
                                      {"Logged events",[]}]},


%d bloggers like this: