Coda File System

Re: Coda git repository available

From: <u-myfx_at_aetey.se>
Date: Thu, 21 Apr 2016 17:47:33 +0200
Hello Jan,

On Thu, Apr 21, 2016 at 10:57:21AM -0400, Jan Harkes wrote:
> On Thu, Apr 21, 2016 at 02:16:02PM +0200, u-myfx_at_aetey.se wrote:
> > We did not see any practical problems or extra stalls caused by
> > synchronous DNS resolution. Definitely not an issue in our workloads.
> > Of course nothing precludes changing to asynchronous resolution if
> > needed but the effort and possible dependencies are hardly justified.
> 
> I am literally fuming reading this. I don't know if you remember, but
> several years ago you had me chasing down a server 'deadlock' issue
> related to callbacks, which I was unable to reproduce and I spend about
> a week on this going back and forth with new patches trying to turn
> readlocks into writelocks in the hope it would avoid some possible lock
> ordering issue, adding global timeouts to the callback break multirpc
> calls and other workarounds....
> 
> You were running your servers with clients that were doing ******
> synchronous DNS lookups?

No.

(we switched to the DNS-based server lookups in 2014)

> And you don't think that would be causing any > practival problems or extra stalls?
> 
> > If not otherwise, the presence of callbacks is much more of a concern.
> > In a file system where clients go disconnected as a matter of normal
> > operation, callbacks do not give much benefit, at the same time callback
> > breaking _does_ cause stalls.
> 
> And on top of that you are blaming the callbacks for your woes.

On top of what?

> Sorry, but I have to cool down before I can respond to any of the rest

This would certainly not hurt. :)

> of your email. In the mean time I'll be busy finding and reverting the
> patch that introduced a global timeout for callback rpcs and any other
> possible regressions that may have been introduced.

Oh thanks for looking. Nice if getting rid of such stalls is possible.

(When you have several hundred clients holding callbacks on a volume or
on a common directory, some of them definitely end up disconnected/dead
when you happen to update something, then it takes time to break their
callbacks...)

> Jan

Best regards,
Rune
Received on 2016-04-21 11:48:07