While working on my WorkerNet post, I stumbled across a weird behaviour with start_links, trap_exit and slave nodes.
Long Story (sorry, there is no short one)
As I was setting up a distributed test with slaves, I also wanted one gen_server to trap_exit’s for the offsprings sake which I did not wish to be put under a supervisor (shame on me
, suddenly – all of the tests stopped working! All of them where either timing out or reporting direct noprocs. Bewildered and wide eyed at 23:40 I gave it a go with the dbg tracer and even went through some of the gen_server source.
No answer.
I chalked it up to the rpc calls for the remote nodes, tried printing out the process numbers in each step. But no – it was a fact. My gen_servers died the instant they where created… Brooding over it, I tried some more but finally went to sleep. Up to then, I knew that the problem was caused by the following two snippets in combination with rpc calls to my local slave nodes
start_link() ->
gen_server:start_link({local,?MODULE},?MODULE,[],[]).
init([]) ->
process_flag(trap_exit,true),
{ok, ok}.
While the non trap_exit’d version worked like a charm. Not wanting to waste more time on it, I just circumvented it like a cheap rug on a very dark and very deep embarrassing hole in the floor with
start_link(succeed) ->
{ok,Pid} = gen_server:start({local, ?MODULE}, ?MODULE, [], []),
link(Pid),
{ok,Pid}.
init([]) ->
process_flag(trap_exit,true),
{ok, ok}.
But I couldn’t leave it at just that. I had to seek help, and so I showed it to my senior colleague Nicolas, I had then devised a test which would reproduce this neatly. He cut it down a bit, and I boiled it to the broth you see here and can compile and run for yourself.
Just for the record: The seemingly expected behaviour would be to see the exit signals appear in the handle_info/2 – not causing the process to crash.
%%%-------------------------------------------------------------------
%%% @author Gianfranco <zenon@zen.local>
%%% @copyright (C) 2011, Gianfranco
%%% Created : 17 Jan 2011 by Gianfranco <zenon@zen.local>
%%%-------------------------------------------------------------------
-module(test).
%% API
-export([start_link/1]).
-export([test/1,init/1,handle_info/2,terminate/2]).
-spec(test(fail|succeed) -> term()).
test(Mode) ->
io:format("Current 0 ~p~n",[self()]),
spawn(fun() -> io:format("Current 1 ~p~n",[self()]),
{ok, _P} = ?MODULE:start_link(Mode)
end).
start_link(fail) ->
gen_server:start_link({local,?MODULE},?MODULE,[],[]);
start_link(succeed) ->
{ok,Pid} = gen_server:start({local, ?MODULE}, ?MODULE, [], []),
link(Pid),
{ok,Pid}.
init([]) ->
process_flag(trap_exit,true),
{ok, ok}.
handle_info(timeout,State) -> {stop,normal,State};
handle_info(_Info, State) ->
io:format("info ~p~n",[_Info]),
{noreply, State,5000}.
terminate(_Reason, _State) ->
io:format("reason ~p~n",[_Reason]),
ok.
Compiling and running we see the expected and unexpected, I chose to call it succeed and fail, based on that the process dies (fails) and succeeds (succeed) in trapping
zen:Downloads zenon$ erlc test.erl
zen:Downloads zenon$ erl
Erlang R14B (erts-5.8.1) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe]
[kernel-poll:false]
Eshell V5.8.1 (abort with ^G)
1> test:test(fail).
Current 0 <0.31.0>
Current 1 <0.33.0>
<0.33.0>
reason normal
2> test:test(succeed).
Current 0 <0.31.0>
Current 1 <0.36.0>
<0.36.0>
info {'EXIT',<0.36.0>,normal}
(5 seconds later)
reason normal
3>
As you see, the process did not die after initialization. It trapped the spawner’s end. One possible explanation could be the one stated is in the module gen_server.erl (read the source Luke!)
%%% ---------------------------------------------------
%%%
%%% The idea behind THIS server is that the user module
%%% provides (different) functions to handle different
%%% kind of inputs.
%%% If the Parent process terminates the Module:terminate/2
%%% function is called.
%%%
Some more digging into this, Nicolas came with the idea of sys:get_status/1 ing the processes. What was revealed can be seen below! The parent of the gen_server:start/1-ed process is itself!
Sys:get_status(<0.37.0>) = {status,<0.37.0>,
{module,gen_server},
[[{'$ancestors',[<0.36.0>]},
{'$initial_call',{test,init,1}}],
running,<0.37.0>,[],
[{header,"Status for generic server test"},
{data,
[{"Status",running},
{"Parent",<0.37.0>},
{"Logged events",[]}]},
{data,[{"State",ok}]}]]}
/G
Like this:
Like Loading...