Linkerd-tcp failing to route to rabbitmq

I’m in the early stages of testing out linkerd-tcp. It’s working well for MongoDB and Redis. But I haven’t been able to get it to connect to RabbitMQ.

I’ve created a repository to replicate the error: https://github.com/quid/servicediscovery

This is running linkerd-tcp 0.0.3 with namerd 1.20. The corresponding test for rabbitmq fails due to timeout, which is similar to the errors I’m seeing when deployed:

ConnectionForced: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
[2018-01-10 22:40:29,940: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**@172.17.0.1:7402//: Socket closed.
Trying again in 2.00 seconds...

[2018-01-10 22:40:32,023: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**@172.17.0.1:7402//: Socket closed.
Trying again in 4.00 seconds...

[2018-01-10 22:40:40,038: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**@172.17.0.1:7402//: timed out.

Running the tests:

docker-compose build && docker-compose run tests
...
collected 5 items                                                                                                                                                                 

apps/tests/linkerdtcp/test_mongodb.py ...
apps/tests/linkerdtcp/test_rabbitmq.py .F

According to the namerd UI the route is succeeding. Other services are routing. I can even hit the rabbitmq via http sometimes:

$ curl -I http://localhost:7402
AMQP  

But it’s still timing out when trying to go through AMQP via linkerd-tcp. Sometimes I get:

connection = pika.BlockingConnection(pika.ConnectionParameters('172.17.0.1', 7402))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 374, in __init__
    self._process_io_for_connection_setup()
  File "/usr/local/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 414, in _process_io_for_connection_setup
    self._open_error_result.is_ready)
  File "/usr/local/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 466, in _flush_output
    raise maybe_exception
pika.exceptions.IncompatibleProtocolError: (-1, 'EOF')

I’m not sure if linkerd-tcp is dropping the connection or if RabbitMQ is blocking it.

Hi @summatix,

I can confirm that I’ve successfully reproduced this issue with your reproduction. When I run your tests, I only see this failure intermittently — so far, I’ve run your tests six times, with three failures and three successes. Is this what you’re seeing as well, or does it always fail for you?

Thanks!

It always fails for me. I’ve noticed intermittent errors when testing manually however.

I just ran this 6 times and with 100% failure rate.

Hmm, that’s interesting. Thanks.

I’ll keep you in the loop as I continue investigating!

Has there been any success with the investigation? Currently this rabbitmq integration is blocking us from taking linkerd-tcp to production.

We’re continuing to look into it. Sorry that there hasn’t been much progress yet.

A quick update: We’re working on testing your reproduction against a release candidate for linkerd-tcp 0.1.1, stay tuned!

Hi again @summatix, I’ve got some good news!

After updating your reproduction to use a linkerd-tcp 0.1.1 release candidate build, the failures no longer occur, even after rerunning the tests several times. We’re working on getting a release version of 0.1.1 ready, but in the meantime, I can push up a release candidate Docker image for you to test, if you’re interested.

Do note that the configuration file syntax for linkerd-tcp changed significantly between v0.0.3 and v0.1.1 (it was made more similar to Linkerd’s configuration syntax), so your test configuration will need to be updated. If you like, I can push a fork of your test repository with the changes I made to the linkerd-tcp config?

That would be great, thanks!

Okay, I’ve pushed an image tagged as linkerd/linkerd-tcp:0.1.1-rc1, and my changes to your reproduction are on GitHub here: https://github.com/hawkw/servicediscovery

Let me know if you still have any failures, and we’ll work on getting 0.1.1 ready to release!

Thanks! I’ve got this successfully deployed now and no hiccups so far communicating with rabbitmq.

Great to hear. Keep us posted. 0.1.1 was officially released just a few minutes ago; should be basically the same as the RC.