Frequently Asked Questions (FAQ)

 

Shutdown of the Genome@Home project

What? Why?
Please see our news post at http://www.stanford.edu/group/pandegroup/genome/new.html .

What will happen to the stats?
When the final GAH WUs are accepted (April 15, 2004), we will publish the final GAH stats snapshot and keep the GAH website alive.

What will happen to clients which have the gah preference selected?
We are setting up a server for FAH units without any deadlines. Only clients with gah selection will get assigned to this server. It will take the place of the gah server. This switch over occurred on March 12, 2004.

How do "deadline-less" WUs work?
In FAH, normally we need a deadline since new WUs are made from old (FAH is following a trajectory and we can't know the next step until the previous one is found). For deadlineless WUs, we can't generate the next one, so we'll only do work where we don't need long trajectories, i.e. where we are interested only in the part of a trajectory that can be done in a single WU. There aren't lots of projects like this, but there are enough to justify one server running them.

What will happen to the GAH results?
We are working to make the GAH results publically accessible to all in a variety of formats, including a BLAST search. The GAH results will continue to have scientific usefulness long after the last WU has been returned.

What will happen to the new GAH core?
We are continuing work on the new GAH core (M3). M3 is built off of Tinker 4.x and is poised to replace both the work that the GAH core performs as well as the Tinker core. The science in these two cores has been approaching each other for some time and so it's natural to combine them. We have no ETA for this release and it would appear as a new FAH Tinker core.

Should we now switch to F@H team numbers? My team has team numbers on both GAH and FAH.

After April 15, we will no longer accept GAH WUs and the final GAH stats will be published on our web site.

If no team changes are made in the config, will the existing G@H teams become F@h teams and the individual results show on the F@H stats.

No, we can't, since there is no fair way to merge GAH and FAH stats (since they are inherently different).

What about machines that have to be turned off on nights and weekends (in which case it should still be set on genome@home)?

For machines which won't make FAH deadlines (slower CPU or machines which aren't on frequently), the "no deadline" server is the best choice. You can select it by selecing GAH in the client.




Older Topics:

For questions regarding G@h Classic and Folding@home 3.0, please go here


Project details, big picture:

Who "owns" the results? What will happen to them?

Unlike other distributed computing projects, Genome@home is run by an academic institution (specifically the Pande Group, at Stanford University's Chemistry Department), which is a non-profit institution dedicated to science research and education.

The results from Genome@home will be made available on several levels. First, we put statistics and information about the protein sequences being designed on the web for everyone to see. These are updated daily, and include information about which users contributed which sequences. Second, analysis of the sequences will be submitted to scientific journals for publication, and these journal articles will be posted on the web page after publication. Thirdly, after publication of these scientific articles which analyze the data, the raw data will be available for everyone, including other researchers, here on this web site.

How can I see how many other people are participating? What has been "folded" so far? And how much have I folded so far?

We keep many types of statistics of users and work accomplished on our web page. Check out our main statistics page here. We also keep track of how much work has been done, how many users are signed up and who's currently running here. You can see how many units you have processed so far (as well as all other users) on this page. There's a lot of information, so browse around and see what you can find. Note that not all of the statistics are updated automatically, so there may occasionally be some discrepancies.

How is this project supposed to help us understand "real" genomes and proteins?

Genome@home studies real genomes and proteins directly, by designing new sequences for existing 3-D protein structures, which come from real genomes. The protein structure files that are sent out as work contain the Cartesian atomic coordinates of a protein. This data was obtained experimentally through X-ray crystallography or NMR techniques. Note that this was not done by us; thousands of scientists have spent decades compiling this data, which is generously made freely available to the public. By designing new sequences that could form these specific protein structures, we're setting the stage to attack a number of significant contemporary issues in structural biology, genetics, and medicine. For example, the Genome@home data will be used to:

  • Try to unravel a fundamental issue in the "protein folding problem" (which itself lies at the heart of a huge amount of modern biomedical research): the fact that thousands of different sequences can all form the same three-dimensional structure.
  • Predict the functions of newly discovered genes and protein structures. Modern approaches to structural biology, known as "proteomics" or "structural genomics", often solve protein structures without knowing what the proteins do. Because techniques for function prediction tend to work best with large amounts of sequence data, a virtual library of sequences for a new protein structure will be an invaluable resource.
  • Potentially design and make new versions of existing proteins for use in medical therapy.

Versions:

What's new in version 0.91?

Both the Windows and Linux versions are new.

i) Caching - If you can't connect to the Genome@home server (it's down, or you're offline) when it's time to deposit finished work, you'll get a few PutWork error messages. After twenty minutes, Genome@home will give up, store the results on disk, and rerun the work units you've already got. Since the results are different each time Genome@home runs, this is completely equivalent to downloading brand new work units. The client will continue rerunning work units (potentially for weeks, if you're on vacation or something) until it is able to connect to the server, at which time it will upload all the results to the server, and you will get credit for the total of all the work units processed.

ii) Checkpointing - Genome@home now checkpoints itself after each of the 30 sequence design iterations. If you complete the design of 23 sequences and your computer crashes, or you Ctrl-C G@H, or there's a California rolling blackout :-), or whatever, the client will continue on and start designing the 24th sequence once it's restarted. However, even if you were 90% done designing sequence 24 when Genome@home was stopped, it still has to start over again from the beginning with sequence 24. Thus, the most you'll lose is the time it takes to design one of the 30 sequences (roughly an hour or so for most machines and proteins).

What's new in version 0.93?

Both the Windows and Linux versions are new for version 0.93.

The Linux version is recompiled to accomodate older processors, such as Pentium II and AMD K-6, which were not supported in earlier Linux versions.

The filenaming error (causing some results to not be sent in) found in verion 0.91 has been resolved. Also, a bug in the loop counting and reseeding of the random number generator in the protein design algorithm itself has been resolved.

Both versions will now attempt to send back any finished results upon (re)start-up of the client. The client will also get new work, save it, allow any unfinished work to finish, and then start new work. If both a new and an old work unit are already present, no new work will be downloaded.

What's new in version 0.98?

Both the Windows and Linux versions are new for version 0.98. This version of the client will not process pre-0.98 work units. When upgrading, it is best to allow an old client to finish a work unit, then install the new version.

The client has a number of additional data integrity checks which verify the integrity of the work unit before it's processed and before the results are returned. The identifying information for each work unit is more closely tied to the work unit and maintained by checksum verifications. The client will reseed a rerun work unit with a random, rather than incremented, 32-bit seed. All these changes were designed to eliminate the possibility of duplicating work units or fabricating false results. The client will also warn the user if it was shut down improperly or if another instance is already running in the same directory.

The client maintains a rudimentary screen log of it's progress (scrlog.gah), with timestamps at each step. This logging system may be changed/augmented in the future.

A number of new features have been added in the form of command-line flags; these also appear separately in the Windows Start menu.

  • -config: Runs the initial configuration step of the client, to allow users to change their username, etc.
  • -upload: Uploads all completed results, then shuts down.
  • -clear: Deletes any corrupt work units which are crashing the client, and restarts.
  • -nonet: Reruns a current work unit indefinitely, without attempting to make a network connection.

The client will attempt to get new work more often than previous versions. After three failed attempts, it will try to rerun any current work units. The wait-time between get work attempts and put work attempts has been reduced to two minutes.


Networking problems:

I have a modem, can I use Genome@home?

Yes, the Genome@home client will work with most modem setups. It will give an error message if it tries to connect when you are not online, but it will continue re-trying every five minutes. Once you go online, it will be able to connect to the Genome@home server.

I'm behind a firewall, can I use Genome@home?

.If you are behind a firewall, please answer yes at the "firewall" dialog box, and then give the client some info about your firewall.

Not all firewalls are supported. Also, please make sure that SOCKS is running.


Errors

Is the server down? Nothing is happening (or) it's giving lots of network error messages.

Occaisonally the server goes down, but the clients (console & screen saver) are designed to wait for the server to come back up and then go from there. You don't need to do anything; this should happen automatically. It does wait several minutes each time it tries to connect, so don't worry if it sits there for 5 or10 minutes (or occasionally even longer).

I keep getting an error message. It crashes right after it starts. What's wrong?

Sometimes, things will just plain go wrong with the client. Usually, all you need to do is delete the file "input.inp" from the Genome@home directory on your computer, and restart the client. This will force it to get rid of the bad work unit and get a new one, which almost always solves the problem.

Why does it stop after "Initializing protein design algorithm"?

It hasn't stopped; this second stage of the algorithm just takes awhile. It could take up to an hour on slow machines.

Genome@home looks strange (windows) or segfaults (Linux)

Genome@home requires at least 32 MB of RAM. Weird things happen under Windows with less memory.

Windows asks for some DLL. Where can I find it?

Microsoft has these DLLs on their site. In particular, you need DLLs for winsock2. These are built into most copies of windows NT, 98, and 2000. However, many copies of windows 95 do not have these.

The Windows socket 2 update for Microsoft Windows 95 resolves a number of Winsock2 issues. This update also resolves a number of TCP/IP stack issues.

I get an error like "Network Recv Timeout"

If you get something like:

Network Recv Timeout

GetWork Failed

then don't worry. It is having problems connecting to the server, and is waiting to try again. If it fails to connect for a day or so, it might be best to start it over again or reinstall. Hit Ctrl-C to exit gracefully, and start it again.


Running

What does the output on my screen mean?

Genome@home tells you how it's progressing through your work unit. It starts off with a huge variety of possibly good sequences, and iteratively searches through and refines these sequences, until a well-designed sequence is found. The core of the design algorithm repeats itself thirty times, each time producing one "best" sequence. After thirty iterations, Genome@home will send the data back to the server and get more work.

Does Genome@home run on dual processor machines?

Yes. Genome@home supports dual processor machines. You just need to run two copies of Genome@home, each installed into its own directory.

Why should I update my Genome@home software to the current version?

We are constantly and rapidly improving the Genome@home software. We release new versions to fix bugs reported by the users to help make the project run as smoothly as possible.

How do the results get back to you?

Your computer will automatically upload the results to the Genome@home server each time it finishes a work unit, and download a new job at the same time.

How much is a work unit?

We define a work unit as the complete design of one 100-amino acid protein sequence. This generally takes about a day or two on an average computer. The size of the protein you get sent may well be shorter or longer that 100 amino acids, and we calibrate for the size of your protein sequence when we calculate work statistics.

How can I make sure my results are being sent back and used? How can I tell how much work I've processed?

To find out data that has been reported back you can check our user stats page. If your computer is returning data, you should see your username there along with the number of work units completed. If your name isn't there yet, you probably just haven't sent back any work units yet.

Can I run Genome@home when SETI@home and/or Folding@home is running?

Yes. Genome@home should run fine while SETI@home and/or Folding@home is running, assuming that you have enough RAM for both.


Statistics, teams, usernames

How are the emails/usernames used?

The e-mail addresses collected by Genome@home are never distributed to any other organization. We are a non-profit research group at Stanford University, and we have no commercial interests. A confirmation email will be sent to the address when you first download Genome@home, or if you start a Genome@home team. Infrequently, we may send a message about new version upgrades or exciting news about the project. If you don't wish to receive further emails, you can opt out on your user page (not yet available).

How can I change my username?

The simplest way to change your username is to uninstall, then download and reinstall Genome@home. It also is an opportunity to upgrade to our latest version, which should run more smoothly and interact with our servers better. Please give the install program the desired username and all new work units completed will be associated with that username.

How can I join/create a team?

To start a team, go here. To join a team, you need to enter your team's account number in the "account" dialog box that appears the first time you run Genome@home. If you're already running Genome@home, you should uninstall, download a new version, and re-install. To see how your team is doing, check here.

Why don't all my work units show up on my team's stats page?

Only work units "labelled" with your team's account number will add to the team total. Make sure that your team's account number appears in the ghclient.cfg file on your computer(s). Any work units that you completed before joining a team will not count towards the team total.

I'm running multiple machines behind a firewall. Can they all have the same user name?

Yes. The Genome@home server will assign each machine a unique cpu id. You can enter the same user name in the "group name" dialog box on every machine onto which you install Genome@home.

Are there any characters I should avoid in a user name?

You can use anything except whitespace (space, tab, etc.) If you want a space in your user name, use an underscore "_".

What happened to the cpudays and hours/unit stats?

Because the new client allows caching and checkpointing, the calculation of the rate at which a user processed work units becomes difficult, buggy, and slightly irrelevant. In the interests of a stable client and server, and smooth user interface for the client and website, we've stopped calculating these statistics.


Misc:

Where did the logo come from?

The Genome@home logo is a combination of three elements. The name itself is presented in bold, colourful letters, floating slightly above the rest of the logo. The two-colour helix represents a string of DNA, the molecule which makes up genes and genomes. This element also appears in the logo of our partner project, Folding@home.

What about security issues?

The Genome@home client software is available for download only from this web site - we do not support Genome@home software obtained elsewhere, and in fact would appreciate it if you would notify us if other people are offering the software for download. This software will upload and download data only from our data server here at Stanford. The data server doesn't download any executable code to your computer. In fact, Genome@home client is much safer than the browser you're running to read this!

Why no Mac/Solaris/etc version?

We're anticipating requests for other versions for various operating systems. Porting the client to other platforms should be easier now that we've had some practice with Folding@home.