Wednesday, June 18, 2008

The More Things Slow Down, The More They Heat Up

Well, it's that time of year again. The big push to get CDT 5.0 out the door is essentially over. Ganymede bits are being uploaded to the servers by the various projects. Things are looking good. It seems like it's just about time to finally sit out on a patio somewhere with a nice frosty beverage and relax.

Or is it?

Firstly, I'm busy prepping to participate again this year in the Ride for Sight, which is the longest running motorcycle charity in Canada. It's going to be a full day this coming Saturday, with many hours of riding, and I have to make sure all my gear is in order. And also, it's the final stretch of fundraising, so I'm busily annoying all my friends. (Shameless plug... to find out how you can donate, go to my donation page ). Hopefully this weekend I won't forget my hat like I did at the Port Dover Friday the 13th Rally last weekend and get sunburnt...

But more on topic, my team still has a lot of work going on right now. Right now most of us are working on Remote Development Tools, which is not on the Ganymede train. We just contributed our first initial drop of our C/C++ remote indexing tools to Bugzilla for consideration, and are still working hard to get more done so that we can get to the point where people might actually start using this stuff.

So... what is this stuff?

Essentially, we are trying to create a development environment where you can run your IDE on your local workstation, but the actual code resides on another target machine. Maybe this machine is a mainframe with no graphical windowing capabilities, maybe it's a gigantic supercomputer that you don't physically have on your desk (or if you did, you'd need a REALLY big desk...). In any case, the code you're actually working on resides somewhere that is not local.

Most of the most exciting value adds provided by Eclipse compared to other development environments require knowledge of the structure of the user's source code. Features such as source code navigation, content assist, intelligent search, call hierarchy view, type hierarchy view, the include browser, refactoring, and other features all require parsing the user's source code and producing an index which allows for name based lookups of source code elements.

Parsing and indexing are both very CPU and memory intensive operations, and good performance is a key requirement if these features are to be used by the average user. The remote scenario provides for some unique, additional challenges which have to be overcome in order for the features to work quickly and correctly.

Some important points to consider:

  • Network mounting the files and operating on them "locally" has been proven to be slow, even on fast (100 megabit) connections with very few intermediate hops.
  • Downloading the entire set of remote files (both project files and included system headers, which are not generally found on the local machine) is similarly slow.
  • Sometimes the remote machine uses a different text codepage encoding than the local machine. This means that not only must the source files be transferred, but they may have to undergo a codepage conversion process, which slows things down even further.
  • Abstract Syntax Trees (ASTs) and indices are typically much larger than the original source code from which they are produced, because they store much more information. I.e., they store a certain amount of syntactic and/or semantic understanding, which is inherently more information than is imparted by the raw bytes that correspond to the source text. As such, it's even more impractical to transfer ASTs or indices than it is to just transfer the original source.
  • The way a file needs to be parsed in order to be interpreted correctly is often dependent upon the manner in which the file is compiled. E.g., macros and include paths may be defined/redefined on the command line of any individual compilation of any individual file. A successful parse requires that those same macros and include paths be provided to the parser when it runs.
  • Often the remote machine has greater CPU power than the local machine, so it can often complete parsing and indexing tasks more quickly than the local machine.
  • Remote machines are often accessed at geographically separated locations. The intermediate topology of the network can often be complicated, with many hops, and slow links. As such, in order to maintain performance it's important for as little data as possible be transferred back and forth between the local machine and the remote machine.

As such, we feel that if the Remote Development Tools are to be successful, then they must provide remote services that allow the user to do all of the parsing and indexing on the remote machine. The local machine can query the remote machine for data it is interested in, and only this data gets transferred over the remote connection.

So, that's the motivation. We just contributed a framework and reference implementation that implements the following features for C/C++:

  • A New Remote C/C++ Project wizard that allows you to create remote projects and configure your service model (files are served by EFS)
  • Integrated index lifecycle management
  • Automatic source code delta handling (the index is automatically updated when files in your project are added/removed/changed)
  • Remote Search
  • Remote Call Hierarchy
  • Remote Navigation (e.g. Go To Declaration)
Other planned services include:
  • Content Assist
  • Type Hierarchy
  • Include browser
  • Model builder
But that's not all. Currently I'm working on a remote make builder, which will essentially let you farm out the build commands for your project over the protocol of your choice (e.g. SSH), scan the build output for errors (like we already do in CDT), and map those back to the EFS resources in your project so that you can do standard things like click on errors in the Problems View from your remote compile and be whisked away to the corresponding source location.

The builder in fact is probably the most important feature to have. Really the "I" in IDE isn't there if you can't build anything. Source introspection tools are nice, but if your tool can't build my source, chances are I'm not going to use it.

At any rate... it's looking like it's going to be a busy summer...

4 comments:

Denis Roy said...

If you say NFS is slow over 100Mb connections, then perhaps you've set up either the client, the server or both, wrong.

Your posting on cdt-dev doesn't reveal much information, so I assume you simply exported the directories and mounted them remotely with the default settings.

All too often I've heard developers call some server component 'slow' (for instance, a database server) when in fact it's their lack of knowledge that is the biggest factor (such as a poor SQL query).

Mike Kucera said...

I know several C developers who do all their work by SSHing into a server and coding in vi or emacs.

I have a feeling that if RDT lives up to its promises it could become very popular.

Chris Recoskie said...

Denis,

The sysadmins are the ones that took care of the export, so I can't comment there.

I mounted them using NFSMaestro on my Windows box.

I'm not saying that in general NFS was slow. It was a lot faster than Samba when I tested the same scenario on it. But trying to do this particular task over NFS was slow, due to the nature of the operation itself. It is not something you want to do on a mounted filesystem. That is my claim.

Scott Lewis said...

Hi Chris,

It might be useful for you to consider using ECF for parts of what you are doing with RDT...we have support for efs, ssh, http, others for filetransfer (in p2) as well as multiple transport support for data channels, remote services, dynamic service discovery and others.