Linkerd + Thrift 0.11.0 - ChannelClosedException spike?

Hi all,

We moved one of our downstream services (PHP) to use the latest Thrift 0.11.0 recently. Connections to it are done via linkerd from Python Thrift clients (which are running 0.9.0 version of the library). Prior to the upgrade, we were using Thrift 0.9.0.

After the upgrade, things mostly worked successfully. We didn’t have catastrophic failures. However, over a period of 24 hours we begin to see a consistent spike of ChannelClosedException for connections to this server.

I am just wondering if anybody around is running Thrift servers using the 0.11.0 version and have had any issues?

(I think we are on linkerd 1.2.1)

Thanks,
Amit.

Hi @amitsaha, we haven’t seen anything like this before. Consider testing the Python Thrift 0.9.0 service connecting directly to the PHP Thrift 0.11.0 service? I’m wondering if this issue is due to the Thrift upgrade and not a Linkerd issue, so let’s rule that out first.

Thanks. I will get back regarding that.

@amitsaha have you discovered anything further here?

Not yet @Alex - still looking into it :frowning:

Okay turns out there was an issue with the our PHP Thrift server. We didn’t set the send and receive timeout values explicitly when we switched to 0.11. Our earlier internal forked version had been patched manually (yay!) to have the timeouts and we missed it.

Once we fixed that issue, we are good!

Ah, that’s great to hear. Glad you figured it out. These sorts of issues can be quite difficult to diagnose.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.