Reported by g_makulik (1244469601|%O ago)
When writing testcases for my OpenAMQ WireAPI C++ wrapper classes I found a strange behavior of the server that looks to me, if there might be some
ressource leaking.
The test case is the following: I have a client that repeatedly opens and closes channels (=connection + session) within the same process.
If I run the amq_server and only this client, everything works fine. But as soon I start another application before that client, that simply connects the server
and starts to consume from a exchange/queue pair that was created within this receiver application, I get errors (connection to server lost) when running
the other application described before. I can "stretch" the number of open channel attempts that fail, by increasing the polling and working threads of the
server. Another thing I have noticed in that context were frequent 'server heartbeat slowing' warnings after the testcase was running.
'amq_server -v' output:
{{OpenAMQ/1.3d0 - revision 12075
Debug release for internal use only
Copyright (c) 2007-2009 iMatix Corporation
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Build model:Debug release for internal use only
Memory model: fat
Threading model: multithreaded
Compiler: gcc -c -I/home/freescale/workspace/ufx1000_software/OpenAMQ/openamq/_install/include -g -DDEBUG -O -Wall -pthread -D_REENTRANT -DICL_MEM_DEFAULT_DIc}}
That's what my linux prints at startup
Image Name: Linux-2.6.23
Created: 2008-12-22 3:22:49 UTC
Image Type: PowerPC Linux Kernel Image (gzip compressed)
…
Here's the source code of my client applications (I rewrote them, just to use plain WireAPI, instead of my API classes, to eliminate possible errors or flaws in my stuff):
Application 1, TestReceiver runs 1st
#include "wireapi.h"
#include <iostream>
using namespace std;
int main(int argc, char** argv)
{
icl_system_initialise(argc,argv);
icl_longstr_t *auth_data; // Authentication data
cout << "TestReceiver: Opening connection ..." << endl;
amq_client_connection_t* pconn;
amq_client_session_t* psession;
auth_data = amq_client_connection_auth_plain( "guest", "guest");
pconn = amq_client_connection_new( "localhost", "/", auth_data, "", 0, 30000);
if (pconn != NULL)
{
psession = amq_client_session_new(pconn);
if (psession != NULL)
{
cout << "TestReceiver: Declare exchange 'TestExchange' ..." << endl;
amq_client_session_exchange_declare
( psession
, 0
, "TestExchange"
, "direct"
, 0
, 0
, 0
, 0
, NULL
);
cout << "TestReceiver: Creating queue ..." << endl;
amq_client_session_queue_declare
( psession
, 0
, NULL
, 0
, 0
, 0
, 0
, NULL
);
char* queueName = psession->queue;
cout << "TestReceiver: Binding queue, routing key is 'TestRouting' ..." << endl;
amq_client_session_queue_bind
( psession
, 0
, queueName
, "TestExchange"
, "TestRouting"
, NULL );
cout << "TestReceiver: Consume messages from the queue ..." << endl;
amq_client_session_basic_consume
( psession
, 0
, queueName
, ""
, 0
, 0
, 0
, NULL
);
cout << "TestReceiver: Waiting for incoming messages ..." << endl;
while (1)
{
if ( amq_client_session_wait(psession,0) == 0)
{
int messageCount = amq_client_session_get_basic_arrived_count(psession);
if(messageCount > 0)
{
// Get next message
amq_content_basic_t* pmessage;
pmessage = amq_client_session_basic_arrived(psession);
for(int i = 0;
i < messageCount &&
pmessage != NULL;
++i)
{
// Get the message body and write it to stdout
size_t size = amq_content_basic_get_body_size(pmessage);
byte* textBuffer = new byte[size + 1];
amq_content_basic_get_body(pmessage,textBuffer,size);
textBuffer[size] = 0;
cout << "TestReceiver: received message '" << (char*)textBuffer << "' ..." << endl;
delete pmessage;
// Get next message
pmessage = amq_client_session_basic_arrived(psession);
}
}
}
// Exit the loop if Ctrl+C is encountered
if(amq_client_connection_get_alive(pconn) != TRUE)
{
break;
}
}
}
}
amq_client_session_destroy(&psession);
amq_client_connection_destroy(&pconn);
icl_system_terminate();
}
Application 2, ChannelOpenCloseTest runs as 2nd
#include <iostream>
#include "wireapi.h"
using namespace std;
int main(int argc, char* argv[])
{
icl_system_initialise(argc,argv);
icl_longstr_t *auth_data; // Authentication data
amq_client_connection_t* pconn;
amq_client_session_t* psession;
for(int i = 0; i < 30; ++i)
{
auth_data = amq_client_connection_auth_plain( "guest", "guest");
pconn = amq_client_connection_new( "localhost", "/", auth_data, "", 0, 30000);
if(pconn != NULL)
{
psession = amq_client_session_new(pconn);
cout << "Created connection " << i << endl;
if (psession != NULL)
{
amq_client_session_destroy(&psession);
psession = NULL;
}
amq_client_connection_destroy(&pconn);
pconn = NULL;
}
icl_longstr_destroy (&auth_data);
}
icl_system_terminate();
}
Attachments:
No files attached to this page.
Comments
Who's following this issue?
pieterhmartin_sustrik
g_makulik
Cybarite
Submitted by g_makulik
Use one of these tags to say what kind of issue it is:
- issue - a fault in the software or the packaging or the documentation.
- change - a change or feature request.
Use one of these tags to say what state the issue is in:
- open - a new, open issue.
- closed - issue has been closed.
- rejected - the issue has been rejected.
Use one of these tags to say how urgent the issue is:
- fatal - the issue is stopping all work.
- urgent - it's urgent.
All open
89 - multi-threaded client connection failure (17 Nov 2012 16:28) [open]
87 - Zyre returns incomplete XML (26 Apr 2010 08:15) [open]
86 - SFL 'random(num)' macro is wrong in sfl.h (31 Mar 2010 09:23) [open]
85 - Zyre does not start on Solaris (23 Mar 2010 01:29) [open]
84 - OpenAMQ JMS - AMQTopic constructor use HEADER name and class instead of TOPIC (28 Jan 2010 17:04) [open]
83 - WireAPI: How to 'override' signal handlers? (14 Jan 2010 17:33) [open]
82 - Opf Classes Cannot Accept Default Values With Characte (06 Jan 2010 09:34) [open]
81 - AMQP Topic Exhange Routing (29 Dec 2009 00:21) [open]
80 - OpenAMQ reports malformed frame on 0-9-1 queue.unbind (20 Nov 2009 12:33) [open]
79 - AMQ Server crashing if subscribe topic is set as #.# (30 Oct 2009 06:11) [open]
78 - Error while publishing the messages faster (30 Oct 2009 05:57) [open]
77 - Tuning for latency (28 Oct 2009 16:47) [open]
76 - New user forum (28 Oct 2009 11:29) [change open]
74 - Simulaneous connect/disconnect from multiple threads crashes (03 Sep 2009 15:32) [open]
73 - Topic Exchange not sending a message to XXX.* (25 Aug 2009 21:10) [open]
72 - amq_content_basic_new() causes seg fault if not connected to broker (12 Aug 2009 23:50) [open]
71 - zyre bugs (06 Aug 2009 09:33) [open]
69 - OpenAMQ and Zyre (15 Jul 2009 11:27) [open]
68 - Change names of max and min source code macros (10 Jul 2009 16:52) [open]
67 - Server crash when multiple consumers ack on shared queue (26 Jun 2009 11:35) [open]
Most recent
90 - Frequent coredump in OpenAMQ (09 Apr 2013 12:32) [fatal urgent]
89 - multi-threaded client connection failure (17 Nov 2012 16:28) [open]
88 - amq_console_agent crashes (28 Aug 2010 08:46) [closed]
87 - Zyre returns incomplete XML (26 Apr 2010 08:15) [open]
86 - SFL 'random(num)' macro is wrong in sfl.h (31 Mar 2010 09:23) [open]
85 - Zyre does not start on Solaris (23 Mar 2010 01:29) [open]
84 - OpenAMQ JMS - AMQTopic constructor use HEADER name and class instead of TOPIC (28 Jan 2010 17:04) [open]
83 - WireAPI: How to 'override' signal handlers? (14 Jan 2010 17:33) [open]
82 - Opf Classes Cannot Accept Default Values With Characte (06 Jan 2010 09:34) [open]
81 - AMQP Topic Exhange Routing (29 Dec 2009 00:21) [open]
Unfortunately I can't reproduce this behavior under SuSe 11.0 on my x86 machine …
So what platform are you actually testing on? Can you try putting the client on SuSE, and then switch it around, with the server on SuSE?
Portfolio
Just to be clear here, you're observing the problem with a cross-compiled amq_server? On what target platform? PowerPC?
I will try and reproduce the issue on x86 here. Failing that we'd have to delve into your exact changes to the cross build environment which may be a fairly big job.
@mato:
Yes, exactly.
Ok, when I'm running the server on the x86 machine and both client apps on the powerpc platform everthing's fine.
When I'm running the amq_server on the powerpc platform and the client apps on the x86 machine the behavior is as described.
So it seems that this is really a problem of the amq_server cross-compiled for the PowerPC platform. Does the amq_server use
APR for thread management? Could be that some APR configuration value I am using causes behavior other than expected?
The number of good channel open attempts relates quite closely to the number of working/polling threads configured for the server.
If the defaults of 4 are used, every 4th attempt fails, if I set the numbers to e.g. 10 every 10th attempt fails a.s.o.
Ok so that points to the PowerPC build as the likely cause of the problem.
To answer your question, yes, amq_server (and clients) do use APR for thread management and the use is fairly substantial. Could you summarize the modifications you had to make to the build to get it to build on your target platform? I recall some messages on the mailing lists regarding pthread rwlock tests failing which might be part of the problem.
Here's the additional config options for APR that I use for cross-compiling:
I am still using my own patch of boomake in foreign/apr, the only difference vs. Pieters changes I have noticed is that I am putting the additional options to the front of conf_opts at line 154:
I am at least not sure about the setting of apr_cv_process_shared, apr_cv_mutex_robust_shared and apr_cv_mutex_recursive
that might affect threading behavior. But could be something completely different …
Regarding the post on the mailing list you mentioned, this was that the APR configure script didn't add -lpthread when compiling
the rwlock test. I have added this using the LIBS variable set in my build environment. Then it was configuring and compiling without
error. I have also checked the pthread implementation on my platform and as far I can see rwlocks are supported.
I checked the APR flags, the only one of consequence might be apr_cv_mutex_recursive. The other changes you made all concern cross-process support which we don't use.
I can offer the following suggestions:
Can you build natively on your target platform? It might be a good idea to run the APR regression tests on your platform (see test directory in APR source) and see if anything breaks.
Also, the version of APR used in OpenAMQ 1.3 is ancient so I would try and build OpenAMQ 1.4 which uses the latest APR. In fact, I'd probably try this first of all and see if anything changes.
Thanks mato. I have done a test with apr_cv_mutex_recursive=no. It doesn't change the behavior of my test case, but even worse amq_content_basic_test fails with a segfault then. I have inserted some additional icl_console_print lines in the
amq_content_basic_selftest method to see which of the (sub-) tests fails, but after doing that, the test worked without segfault :-(.
I also think the best option is to try OpenAMQ 1.4 with the latest APR, I have seen several fixes in the APR change logs, that might affect
threading behavior.
It was far more necessary than just taking the latest version of OpenAMQ / APR, and I don't even think that this really made the point.
The problem in fact was to configure and cross-compile APR correctly, which isn't an easy task. Especially I had no possibility to compile on my target board, which seemed to be a viable solution for other APR cross-compiling people I came over in web research.
I took the following approach: I compiled a version for my x86 machine and for the powerpc and compared the results of configure (the last part of the foreign/apr/apr/config.log actually, starting from ## Cache variables. ##). There was about a dozen and a half (relevant) differences in the configuration variables and decisions made by the configure script. Where I wasn't sure that configure made a wrong decision, because the configure test program couldn't be run when cross-compiling, I took the test program, compiled it manually and let it run on my target. If the program was running I just override the according configuration variable.
At least my goal was, to get exactly the same configuration values as with the x86 compile. That worked fine for almost all configuration values except ac_cv_file__dev_zero and apr_cv_epoll. To get the latter working it was necessary to enable event polling in my kernel configuration. For the otherone I had to do an ugly hack in the configure script itself, because even if you tell configure ac_cv_file__dev_zero=yes, this will be ignored, if configure checks that mmap() works with /dev/zero.
To work around this, I introduced another configuration variable ac_cv_mmap_dev_zero_works manually in the script, to skip this test. I'm not experienced with autoconf actually, but IMO this should be changed in the source of the generated configure script.
For overriding the configuration values I use the following settings now:
Meanwhile I believe, that the strange behavior was more related to APRs chosen socket handling mechanisms, than to threading behavior. I'm not sure, but it seems having epoll plays a significant role herein.
@pieter:
How are you managing the APR configure script? Is the configure script also generated from autoconf input, or managed just as provided? I couldn't figure out from which source it's generated.
I would like to provide a patch for that issue about ac_cv_file__dev_zero, but I don't know what's the best way to do it. I have searched the
APR mailing list, and found a post about the same issue mentioned in a recent thread, so I replied also there:
[http://www.nabble.com/-vote--release-apr-1.3.4%2C-apr-util-1.3.5-ts23775995i20.html#a24162578]