JB: Welcome to the CREN Tech Talk series for spring of 2001 and to this session on "State of the Art: Peer to Peer Networking." You are here because it's time to discuss the core technologies for your future campus. This is Judith Boettcher, your CREN host for today, and our session today is coming to you with the support of the CREN member institutions and Dell computers. Look on the Dell site for a free white paper on Demystifying Windows NT Security.
I'd like to welcome Howard Strauss of Princeton - back from the Caribbean - who is the technology anchor for Tech Talk. Howard is a well-known web technology expert and portal expert. Welcome back, Howard.
HS: Thank you, Judith. I'm Howard Strauss, the technology anchor for the Tech Talk series of technology webcasts. Today, we'll engage our guest expert, David Anderson, in a lively technical dialogue that will answer the peer-to-peer networking questions you'd like answered and will ask those very important follow-up questions. You can ask your own questions by sending e-mail to expert@cren.net anytime during this webcast. If we don't get to your questions during the webcast, we'll provide an answer in the webcast archive.
Topics
Now that the human genome has largely been decoded, the next problem - which is orders of magnitude more complex - is to understand protein folding. A better understanding of how proteins fold will give scientists and doctors the insight into diseases that will enable them to be cured or to be better managed.
The protein folding problem, however, is so computationally intensive that it will require the fastest computers imaginable. What is the fastest computer today? Well, the answer may depend upon who you ask. IBM thinks it is their Blue Gene computer. It will be the world's fastest supercomputer and is being designed specifically to handle problems such as protein folding - hence its name. Blue Gene will execute instructions at the rate of one petaflop or one thousand trillion floating point operations per second, and is a thousand times more powerful than Deep Blue, the IBM chess-playing computer that beat chess champion Garry Kasparov in 1997.
Other people might say that the fastest computer is the hundreds of millions of computers on the Internet, working together on the same problem. For some problems, we have already done just that. SETI At Home, for example, the ubiquitous screen saver that analyzes signals from outer space in search of extraterrestrial intelligence, combines over 2,000,000 computers to do the work of a giant supercomputer. SETI At Home is just one example of peer to peer computing, a computer network architecture that aggregates the power and resources of distributed systems.
Some scientists are looking at other problems that might be amenable to this approach, including the protein folding problem that I mentioned before. By using otherwise wasted CPU cycles and unused disk space on unused disk space on large collections of computers, it is possible to use hordes of small machines to attack certain kinds of complex problems that lend themselves to running in this very special mode.
Some problems cannot be sped up at all by this kind of parallelism. For example, while we all know that one woman can produce one baby in nine months, there is no way that nine women can produce one baby in one month!
Although we are tempted to look at the enormous number of unused CPU cycles and disk space on all Internet computers as an untapped treasure, we must be sure that both the social and economic costs to recover them provide a good return on our investment and that the use of this resource is socially responsible. The US Navy estimates that a cubic mile of seawater contains tens of thousands of dollars of gold, but the cost of recovering this gold and the environmental impact in doing so far exceeds any benefit we'd get from it. Leo Zillard, in his book Voice of the Dolphins, found intelligence in sea-dwelling mammals. David Anderson, today's expert, thinks it might be found in extraterrestrials and B.F. Skinner skeptically suggests that "the real problem is not whether machines think, but whether men do."
The real challenge seems to be how we can best harness the great computational power we now have and apply it in the best and most responsible ways. Peer to peer networking may prove to be a powerful tool to do that. We'll look at some of the amazing possibilities on today's webcast of Tech Talk. Judith?
Topics
JB: Thank you, Howard. What a fascinating introduction here, and as we've been looking at peer to peer networking and the whole concept of how the network is evolving, I think we're going to have some exciting ideas about creating almost an international social brain just filled with all these computers. But let's let our expert today, David Anderson, tell us more about where he's coming from on this and how it all might work.
We're very pleased to welcome David Anderson to our CREN Tech Talks. David is the Director of the SETI At Home project at UC Berkeley and also the Chief Technology Officer of United Devices. David has roots in Wisconsin where he received his Ph.D. in Computer Science from the University of Wisconsin at Madison, and then David moved on to the University of California Berkeley where he is today. David's research interests include operating systems, distributed computing and computer music, and in his spare time, he's an avid classical pianist and rock climber. An interesting combination, David! Thanks for being here, and welcome!
DA: Thanks, Judith!
HS: David, perhaps we could start by you just telling us what you think peer to peer computing or - I'm sorry, I said peer to peer computing. Peer to peer networking is.
DA: Well, the phrase has a strict interpretation, which is distributed systems that lack any kind of central coordinator, are basically democratically organized, where one computer is the equal of others. Kind of a server list distributed system. But in practice, I think we need to interpret it a little bit differently because the projects like SETI At Home and Napster that the term is usually applied to, in fact, do have a central coordinator. In the case of SETI At Home, there's actually no direct peer to peer communication. So I think-
HS: So it's the idea that there's not a central thing is - that's not always a guide as to whether it's peer to peer?
DA: No, I think what characterizes systems like SETI At Home and Napster is their use of resources that are kind of at the edges of the Internet. In other words, they're harnessing power which is on home PC's, PC's in offices and schools rather than servers sitting in machine rooms somewhere.
HS: Do we have to be talking about the Internet? Could we have peer to peer computing on an intranet?
DA: Sure. Anytime you have a network that can communicate data between storage and processors, you have that possibility. So a term I like to use, which doesn't have quite the catchiness of peer to peer, is edge resource aggregation - the idea that you're pulling together resources of one sort or another at the edge of a network rather than the middle of it.
HS: How does this peer to peer stuff fit in with the other stuff we've seen, with mainframes and client/server and all the other kinds of computing that we've seen?
DA: Well, I think to some extent you can think of peer to peer, or more accurately, edge resource aggregation, as being the next big paradigm in how to make computer systems. The major periods in computer history were the mainframe era, back in the old days when there was a big computer behind a glass wall, and of course at some point, somebody thought the world would only ever need five computers.
And then we moved on to desktop computing in the 80s when it became possible economically for people to have their own computer and control their own destiny. And more recently, in the 90s, the client/server architecture, exemplified by things like the web where the main resource, the data being served or the computing being done is back in the big powerful computer in the machine room and you still have a computer on your desktop, but its main use is to do user interface, to show you graphics and to basically talk to the user and the extreme version of that was called the thin client paradigm where the computer on your desk was really stripped down and need not even have a disk, for example. The people pushing that were, of course, the people selling the powerful central computers.
Topics
HS: Normally we wait a little while before we bring in any e-mail questions, but since you just mentioned this idea of IBM mentioning five computers was all we were going to need - I guess that was Thomas Watson who said that - and we just happen to have a question that picks up that theme. Perhaps we'll read it to you now. This is from Stan Johnston, and Stan says, "Do you see us all as being nodes on the global computer? Perhaps IBM was right way back when they estimated that we'll only need five computers." Stan says, "Perhaps the ultimate supercomputer is a global peer to peer supercomputer." What do you think of this comment?
DA: Well, I think that's the way things are moving. Of course, it would be more like five billion computers instead of five.
HS: But I guess Stan is suggesting that when you look at a big aggregation of all these things, it looks like one big computer.
DA: Yeah.
HS: Just with distributed processors.
DA: Any time you can break down barriers that prevent one resource from being used to limit the tasks that a given resource can be applied to, that's less efficient than a system where any resource can be used for anything. So I think the forces of economics are pushing things in the direction of peer to peer where these days, the computer that a consumer buys for maybe $1,000 is now an incredibly powerful machine. You can get a gigaHertz processor and 30 or 40 gigabytes of disk space. Generally, it's way more than that person actually needs for their own use. Using your computer to browse the web or read e-mail, basically it's sitting there idle about 99% of the time. And when you hit a key or move your mouse, there's a little blip of activity. And except for that, it's just sitting there.
HS: But why would people give up these extra resources? I drive the highways around here and I see all these SUV's with one person sitting in them. They clearly have more SUV than they need, but they don't give up the extra resources. What incentive do people have to give up these extra resources on their computers to do this peer to peer stuff?
DA: There's a highly complex answer to that. There's many possible reasons and it varies a lot with the person and their own motivations. I think the next few years are going to see a very thorough explanation of how researchers or commercial companies can try to get access to those resources.
HS: Right, but you're involved with this SETI At Home. Maybe you could tell us more about SETI At Home and how it fits in with SETI and it seems like you said, what, 2.8 million people are using SETI At Home now?
DA: Yep!
JB: And actually while you're doing that, David, I'd like to go back and make certain that we kind of differentiate between perhaps a model of computing, either in the peer to peer realm where you've got some machine functioning as perhaps a mothership or something, doing some things, as opposed to totally peer to peer.
DA: Well, I'll just start off by describing what SETI At Home does and how it works.
JB: Okay.
DA: The science that we're doing is what's called Radio SETI, listening to radio waves that are coming from other stars in the Milky Way galaxy and radio waves have the nice property that they go through solid matter like dust clouds in interstellar space and it's long been thought that if another civilization were to want to communicate with us or perhaps is giving off radio waves because of their own communication, that that would be our best chance for detecting evidence of intelligence outside of earth. It turns out that scientifically, it comes down to a computing problem.
What we're looking for is what's called narrow band signals, signals like radio and TV stations, where the energy is packed into a small frequency range. And discovering those signals, basically the more computing power you can throw at it, the better you can hear, the fainter a signal you have a chance of hearing.
Previously, all Radio SETI projects used their own supercomputers. Most of them actually built special purpose supercomputers to do this kind of signal processing. And they were limited because you can only get so much computing power.
We took the other approach of recording the data digitally at the telescope. We use the telescope at Arecibo, Puerto Rico, and distributing it through the Internet to a screensaver program we developed that does the first phase of the signal processing. And then we collect the results back at our server. So there is a central server that collects the data, distributes it and collects the results. There's really no alternative to that architecture, but our computing power is totally distributed. And this has given us an amount of computing power that not only exceeds any previous SETI experiment, but it actually exceeds all other computers, period.
Topics
HS: This is a very special kind of problem, though, in that the computers working on this problem don't have to communicate with each other.
DA: That's right. I can discuss in a bit the properties of different computational problems that make them more or less amenable to wide-scale distribution. SETI is certainly ideal for that because, as you say, each one of these chunks of data can be analyzed in isolation. You don't need to know the results of another chunk before you can start. And also-
HS: Even if one fails, it doesn't stop the other ones, so you can just start her up again! Or give it to somebody else.
DA: Right. Yeah. And in fact, we analyze each piece of data a minimum of two times and usually three.
HS: Because? Why would you do that?
DA: Well, for checking purposes. It turns out that even though computers compute the right answer most of the time, sometimes they err and I think part of this may have to do with people over-clocking their computers to try to make them run faster, which is something a lot of hobbyists do. When you do that, the first thing to go is floating point accuracy, so we get back a small percentage of results that are wrong and the only way to tell is by comparing them with other results for the same data. The other nice property of this SETI thing is that you send a relatively small amount of data-about a third of a megabyte-and that keeps a fast computer occupied for about 15 or 20 hours and given the slow speed of a lot of people's Internet connections, that's an important property.
HS: And it sounds, then, like people could also do some processing offline.
DA: Right.
HS: They could pick up the data, get off the network, and continue to work.
DA: That's right. Takes a couple of minutes to get your data and then you can get off the Internet and stay off for 24 hours.
JB: Does that work in the background, or does it turn on and off depending if a user is using their computer?
HS: Well, it's a screen saver, so you know-
JB: Ah! You're right. Okay.
DA: It can actually work either way. Normally, it always computes when you're not using your computer, like a screen saver. You can optionally have it work in the background all the time. It runs at a low priority, and as long as you have enough memory so that that doesn't cause paging, then you won't notice its affect on your performance.
HS: Okay, we have another interesting question here from Richard Danielson. And Richard says, "If I were to dump my client/server approach and set up a peer to peer network for my classes, what would it look like and how would it work? How would I still maintain instructional control of the class?" Richard says he's a professor and he likes control.
JB: Most of us do, right?
DA: Well, I guess it's an issue of what exactly he is going to use that peer-to-peer network for. A general issue with distributed systems like this that run on computers that are out of your control and which you don't necessarily trust is the issue of privacy and reliability. For example, in SETI At Home, as with a lot of distributed computing problems, the data that we're sending to people is kind of something we don't want them to have direct access to. At SETI At Home, we were worried that if the format of our data was publicized, that an overzealous alien hunter would announce their own signal discovery and create a PR fiasco for us.
HS: Couldn't somebody just take your data anyway and modify it before you processed it?
DA: In general, if you need to have privacy and control over your data, there are mechanisms that use encryption that keep people from looking at or modifying your data, even though it's residing on their computer.
Topics
HS: Can we talk about some of the problems that are appropriate for peer to peer?
DA: Sure.
HS: You were starting to talk about some of the characteristics of the problems. In the case of SETI At Home, we had this thing where we're processing really independent sets of data that were not dependent at all. Is that necessary for peer to peer?
DA: Not necessarily. As a kind of a background to this, I should point out that there's several dimensions to tasks that computers do, and computing is only one of them. The way that I think of things is that there's computing, there's storage and there's network communication. A given task is some mixture of all three of those.
There are, of course, extreme examples like there are computing tasks that are really only computational and require only very, very small amounts of data to be transmitted or stored. Things like mathematical problems looking for prime numbers of encryption, breaking encryption schemes and things like that. At the other extreme are tasks like storing and serving data. You could potentially think about replacing web servers with distributed storage or taking that one step further, building a system for storing and distributing movies and TV programs in some high resolution digital form and potentially storing every TV program ever recorded and implementing that by using a distributed storage-
HS: Like Napster. Isn't Napster an example of that kind of thing?
DA: Napster is kind of a prototype. You'd have to solve various technical problems which Napster doesn't address, including the copyright and legal problems of preventing unrestricted access to these files. But applications that involve storage are very interesting.
In the same way that SETI At Home has this huge amount of computing power, greater than any existing supercomputer, the amount of unused disk space on the 100 million or so Internet connected devices today - and that's scheduled to grow to about one billion in two or three years - lets us do a lot of really interesting things. One of them is simply storing amounts of data that we haven't been able to conceive of before, like all movies and TV programs. But in addition, we can use a technique called replication, where we store multiple copies of each data item and what that gives us is that even though the individual computers may be relatively unreliable because people turn them off at night or they just go away or they're not available because they're in use, we can achieve pretty much arbitrary degrees of availability by choosing the right degree of replication.
JB: That's interesting. So in other words, we could in fact achieve predictability of access, even though we would be dealing with many perhaps unreliable and unpredictable behaviors.
HS: If we had enough redundancy.
JB: Yeah, if we had enough redundancy.
DA: And there's kind of a complementary technique called striping where you can take a file and split it up into a lot of separate parts, store them in different computers and when you want to read that file, you can kind of read them in parallel from all those computers. And even if the network connections of the individual computers are relatively slow, like modem or DSL, you can get an aggregate bandwidth where your total throughput for reading the file again can be made arbitrarily high. So this ability to use basically software techniques to give you as reliable and as fast a storage server as you want is quite intriguing for storage systems.
Topics
HS: Are there any other applications that kind of pop up as being particularly interesting for peer to peer? I mean, I think you're mentioning about distributing all the movies or all the CD's or whatever sounds kind of interesting.
DA: There's a whole slew of computational problems. The one you mentioned of protein folding. There's other things like gene sequence analysis and virtual drug design, where you basically replace a laboratory with a computer simulation of a chemical reaction.
HS: So is there a SETI At Home-like thing for protein folding out there now, or is there about to be one?
DA: Actually, there is. And they chose the highly original name of Folding At Home. It's a project-
HS: Sounds like origami.
JB: Sounds like it!
DA: It's a project based at Stanford and I actually run it on my laptop, heretical though that may be. There are other - there are proposals for really interesting science research projects like global climate prediction, trying to use computers to figure out if we're going to destroy the earth by global warming. Species extinction prediction. A lot of people are looking into new techniques for financial data analysis that use distributed computing. And another example is graphics rendering for movies. There's all these problems where companies are either buying supercomputers or more often building huge roomfuls of dedicated PC type computers. So these things can potentially be done faster and better with distributed computing.
HS: So I mean, to folks who are going off and building this big IBM computer, Blue Gene, which is a one-petaflop machine that's going to do protein folding, do you feel that that's really not the way to go, that it's going to be - whatever you called this thing, the folding thing - Folding At Home. Is Folding at Home going to be more effective, do you think, than Blue Gene? Are they going to be complementary? How do you think-
DA: Well, you know, I don't know the details of Blue Gene and I'm sure that they have good reasons for doing what they're doing. There will always be problems that require so much data to be moved between processors or which require such tight synchronization between processors that they require the giant supercomputer approach, but my feeling is that that range of problems will shrink continuously, in particular as the Internet becomes faster and approaches the gigabit per second speed of Local Area Networks. That may shrink almost to zero.
JB: Would one possible use be that if one uses the huge distributed computing model for a lot of the potential searching research, but then when you think you've got a solution, you would use the large single computer to double check or recheck results on a larger distributed project?
DA: Yeah, another similar situation is in virtual drug design, where you're basically sifting through a database of millions of chemical compounds, looking for one that might react with your cancer cell protein. You use that technique to basically pare down the list. In the end, of course, you actually have to synthesize the chemical and try it out in a test tube and eventually try it out in subjects. So yeah, it is useful for screening.
Topics
HS: Back to this idea of why somebody would do this. With SETI At Home, I assume that people who are running the screen saver believe they're actually doing something good, like contributing to charity of something. Somehow they're helping to find extraterrestrial intelligence. I assume that's why people are doing it.
DA: I was very intrigued by that question. I was caught a little bit off guard by the runaway success of SETI At Home and about a year ago, I put an online poll on the SETI At Home website that asks a bunch of questions, including "Why are you participating in SETI At Home?" And there was a pretty good mixture of responses. About 50% of the people, I think, were doing it because they wanted to help find extraterrestrial life, but there's also a lot of people who view it as a way of benchmarking their computer. I think there's a certain class of maybe it's the modern equivalent of American hot rodders who want to have the fastest car on the block. A lot of people want to have the fastest computer, and SETI At Home gives them a way of demonstrating that and showing it to the world in a public place. If they can get on to our Top 100 User list, then they've [inaudible].
HS: Tell me about those incentives. I had no idea that they existed. I don't run SETI At Home. So you're providing some kind of incentives, it sounds like little contests or things like people who play video games or something like that. Like get their initials somewhere if they score real high? How are you doing that, or what are you doing?
DA: Our website shows lists of top users broken down by a lot of - you know, the top overall and broken down by country. We let people form teams that are categorized into school, small company, large company and we show lists of these top teams.
JB: You actually have students in the elementary and high school forming teams to help do this?
DA: Yeah! There's, I think, about 40,000 teams right now all over the world. A lot of them are in the primary school and secondary school categories and one of our reasons for doing that was to provide a way for students in one class to discover and communicate with students elsewhere and talk about SETI At Home, maybe do science fair projects based on it. But that mechanism has also turned into a way for these computer performance hot-rodding types to gain a place high on the list.
HS: So you turned it into kind of like a contest or a game or something like that and so people just are going out and enjoying it.
DA: Yeah, and it's good for us and it's good for them.
HS: Could you talk a little bit about the difference between these things where there is some kind of central point control? Like for example, Napster and Gnutella both distribute music, but my understanding is that Napster does have a central point of control but Gnutella doesn't.
DA: Right.
HS: Which one's better? Do they both work? When would you use one or the other? Are there downsides/upsides of these things?
JB: And while he's answering that, Howard, let me remind everyone to send in more questions. Now is a good time to expert@cren.net.
HS: And one just came in!
JB: And one just came in! Oh! That was fast!
HS: This is like when you do these fund drives, right? And the phones are ringing. It's too bad that folks who are listening can't hear the e-mail coming in.
Topics
JB: All right, David, do you remember the question where we were going?
HS: We were talking about the difference between Gnutella and Napster, based up on the fact that one of those - Napster - has a central control thing and Gnutella doesn't.
DA: Yeah, I mean, very roughly the way Napster works is they have a server that maintains a list of the songs that all Napster uses have in their disks and so if you want to download a song, your Napster program contacts their server and gets back a list of a bunch of people sorted in some way that have that song. Then when you download it, your computer connects directly to their computer. So the Napster server does not hold the audio, and I think there was some hope that that would excuse them from copyright-related issues. The actual audio is only on the end computers, the peer to peer computers, but their central server is kind of like a directory, like a phone book of where to find songs.
HS: So when you want to play music in Napster, you go to the central directory and it knows where everything is.
DA: Right. Gnutella lacks that central structure. It's more if you were to think of a real-world analogy, if you want to get a song, you ask the ten people closest to you and if one of them has it, they give it to. Otherwise, they ask ten people close to them and so forth.
HS: And do they somehow keep track so that people who are already asked don't get asked again?
DA: Yeah, there's various ways of optimizing this distributed lookup procedure, but as it scales, people have done performance studies and the Gnutella approach for locating files basically hits a performance wall when the system reaches a certain size. There are some advantages in having centralized information so you can make a globally optimal decision about which copy should be sent.
JB: So in that respect, the Napster is more similar to the SETI At Home structure, then.
DA: In a way, yeah.
HS: Are there some problems that are particularly suited to this kind of structure that Gnutella has? I mean, are there problems that would work better in the Gnutella mode without a central control rather than a thing with a central control?
DA: I think the main purpose of absence of central control in that case is simply that there's nothing to shut down. It's impossible to suppress.
HS: So the government could never go after Gnutella, or it'd have to go after people one at a time.
DA: It would be extremely difficult.
HS: Because they'd have to do people one at a time.
DA: Right.
HS: They couldn't find the central control because there is no center.
DA: Yeah.
JB: Should we take a couple of the questions, Howard?
HS: Yeah, sure.
JB: How about the one that came in from - let's see-
HS: Michael Setzer?
JB: Phil Weldon, actually.
HS: Sure.
Topics
JB: Phil is from Mindspring.com and he's asking, "How applicable is the experience with a volunteer distributed computing system such as SETI At Home to for-profit distributed computing projects and networks?" And he mentions one such as Napster, but I understand that there's also another company - is it Juno?
HS: June.
JB: -that's trying to do this. And then he continues with the second half of the question, saying, "What about maintaining the trust of the user base?"
DA: There are several companies that are basically trying to commercialize or monotize the SETI At Home model. I happen to work for one of them, United Devices.
Juno is an ISP that's trying to figure out how to get some money out of all the people they're providing free ISP service to by using their computers. That has its own set of issues, of course, with respect to the privacy rights of those people. All of these companies, assuming that they're going after the model where they're trying to get consumer computers out on the Internet, of course, there is an alternative model of licensing the technology into corporate Intranets, but assuming you're going after consumer computers, there are all these dual problems of how to motivate people to run your program and secondly, how to maintain their trust. Companies such as United Devices are contemplating a mixture of approaches, one of which is to provide a mixture of tasks that your computer will do, some of which are good-of-humanity things like cancer research, and some of them are commercial things that pay the bills for the company. They're also trying different forms of paying people, either by direct sort of per-CPU hour payments or micropayments-they tend to be very small-or another approach is to run sweepstakes where there are larger prizes that are maybe your chances are based on how much work your computer got done.
I think that the lesson from SETI At Home of the power of the competitive spirit of computer owners is something that these companies are also taking note of and providing similar mechanisms of rankings and leader boards and so forth.
HS: But as I understand the Juno model, Juno is going to choose what thing runs on your home computer, so you're not going to have this association to the thing that's being done. With SETI At Home, you say, "Great, I'm doing a search for extraterrestrial intelligence. That sounds good to me." Or if you were doing this folding, protein folding thing, you might decide that protein folding is a good thing. But with Juno, it sounds like you wouldn't know what was running. Do you think they're going to have a more difficult time selling that?
DA: I think they'll have - I think it'll be impossible to sell that! I personally feel that a mandatory condition is that users always be made aware of what is running on their computer.
HS: And that they control the little door that lets it in and out so they can choose the things.
DA: Or at the very least, they can turn it off if they don't like what it - I think there has to be complete disclosure, you know. If I own a computer, I do not want to allow the possibility that it might be used for nuclear weapons research or something like that.
HS: Or tobacco research or something, whatever you don't like.
DA: Yeah. So I think a good compromise is to have full disclosure, first of all, and secondly, to give users some control, maybe to select the area of applications that they'll allow to run or to exclude a few of them that they'll want.
Topics
HS: And do you think that - I mean, with SETI At Home, you're running little contests and things like that. You mentioned that on some of these things, you're going to have to give people micropayments and things. I mean, do you think the micropayment model is going to work? Could you charge for these cycles on this distributed supercomputer?
DA: When you work out the economics of micropayments, it comes out to amounts of money that are small enough that I don't think they would motivate individual consumers. It's on the order of a dollar a month, maybe. There's a modified form of that where you put in some randomness and some variants and you make it into a sweepstakes where you can advertise, you know, maybe a $10,000 prize each month. The same amount of money could be distributed, but it seems like more. It's kind of exploiting the gambling or lottery behavior of consumers.
JB: Interesting!
DA: There's also, if we think about using this model for corporations, most of which own large numbers of desktop PC's, we can think about the possibility of a corporation installing this agent software on all of their computers.
HS: You're referring to something like Condor?
DA: Like Condor or any of these commercial products. A company that 1,000 or 5,000 or 10,000 computers has a resource on its hands that it could either use for its own computing purposes or basically rent to other companies to do computing on. There may be some companies that don't really have supercomputing needs, but own a lot of desktop PC's and this would give them a way to monotize those.
HS: How does all this stuff play on the PDA's that people are starting to carry and the wireless cell phones and all the other wireless things that people have? Are we going to have SETI At Home on my Palm Pilot?
DA: Well, one thing about SETI At Home, the porting of it, the conversion of the program to run on different platforms has all been done by volunteers and we actually have got offers to port SETI At Home to cell phones, which have processors in them, and hand-held devices. We haven't done that, and as for the foreseeable future, it's probably not an interesting thing to do.
HS: Because?
DA: Well, mostly because those devices are engineered to use as little power as possible and when you're not actually using them, they go into low power mode where their clock rate goes way down and they can't really do a lot of work. So [inaudible]
HS: [inaudible] wake them up and just burn the batteries down.
DA: Yeah, you really wouldn't - the end result would be that your batteries would wear out really quick. I don't think most people would want that.
JB: Let's go back, perhaps, David, when we were talking about the whole model of SETI, and that is, we've been talking about the actual PC resource. And we have had a comment come in from a colleague at Berkeley, Jack McCready, wanting us to shift the attention and question to looking at the amount of bandwidth a SETI At Home project requires. And he makes a comment that there's just a huge amount of bandwidth usage going on with this kind of a project. Would you like to comment on that? Or would you comment on that, please?
DA: Yeah. SETI At Home, even though, like I said, the amount of data you have to send per unit of computing is pretty small, it still adds up when you're talking about 2.8 million users. So our servers distribute data at an average rate of about 20 megabits per second, day and night, and that makes us the largest user of outgoing network bandwidth from the UC Berkeley campus. And we've actually had to negotiate with the campus network people, putting sort of a limit on the amount of bandwidth we use. And as people explore different problems, the bandwidth will be even higher and I think that there will have to be some controls that keep backbones and ISP lines from getting clogged up by data-intensive problems. The real issue is that network traffic currently is not prioritized. There's no way of saying, "This connection or this packet has lower priority than normal."
HS: But people are doing that, David. I mean, there's people who have routers that do Quality of Service stuff, and so they prioritize packets.
DA: Well, as soon as the next generation IP gives us sort of an end-to-end way of expressing that, then I think it'll be much more feasible to do problems that would otherwise clog people's networks.
HS: That would seem to solve a problem.
DA: Yeah.
HS: If these SETI At Home things were just lower priority, so what they would do is they would just sop up the network bandwidth that wasn't being used anyway. And that would seem to make everybody happy.
DA: Yeah, the next generation Internet, which will carry digital media like television, will have to provide those Quality of Service mechanisms and those will help for distributed computing a lot.
JB: Have you looked at, again looking at that bandwidth problem, David, having a server located someplace other than UC Berkeley or to distribute those particular costs, or does the software really require the model of having the server in one location?
DA: We've certainly looked into that. We've had a lot of offers from companies to donate bandwidth to us. The problem is that sort of our setup, where we get these tape cartridges and we have dedicated machines that we plug these tapes into and they divide them into work units, we would have to replicate that whole system and make a copy of it and start mailing tapes somewhere else. And that was complicated enough that we haven't done it.
JB: I see. Okay. Howard, we've got a couple other questions.
Topics
HS: Yeah, we have a question from Michael Setzer, which is kind of an interesting question. Michael's in Guam and so his question is about the distribution of the use of SETI. He said, "I was just wondering how the distribution of users across the world is and how it compares with what David expected."
DA: Do you want to comment that it's already Friday, 7:30 AM where he's coming from here?
HS: I've just gotten off airplanes myself, so I have no idea. For me, it could be 7:30 AM on Friday here, Judith!
JB: Okay. David, was the question clear enough for you?
DA: Yeah. The distribution is amazingly international.
HS: What I would wonder is, are you getting more - you know, a bigger percentage of, say, the people in Guam than you are in the United States because - not to say bad things about Guam - because there's less things going on in Guam than in, say, New York City?
DA: Well, in rough terms, about 50% of our computing power comes from the United States. Second is Canada, then several European countries. There's a total of 226 countries represented. We were actually very curious about this also, so in the list of countries on our website, you can view it sorted either by computing power that the country's doing or computing power per unit population of that country. And actually, the leaders there are places like Antarctica where basically most of the people there are scientists who run SETI At Home. So there actually are a number of countries outside the United States which are small, but they're doing a disproportionate amount of work for SETI At Home.
HS: Okay, we have another question that actually, the timing of it is just perfect because toward the end of our webcasts, we want to ask questions just like this. So thank you, Mary. We have a question from Mary Grush at [inaudible] and she says, "In what areas�" Actually, it's a couple questions, but the first one is, "In what areas will higher education institutions specifically apply peer to peer networking?"
DA: I think the first area will simply be in computational research in some-
HS: When you say that, you mean to learn more about how you do distributed peer to peer networking?
DA: No, rather that many areas, especially scientific areas, are increasingly using computing to do research, either by doing simulations or types of data analysis that require large amounts of computing power and there's a lot of people whose research is limited by how much computing power they can get. And the possibility that a university might be able to pool together all of its computers to help its own researchers or maybe even form a consortium such as is being done with the various grid computing initiatives will enable different types of research that simply haven't been feasible until now.
JB: I think we actually had an example of that from Penn State, gosh, sometime last year, talking about the Unix computers, large Unix computers being used for shared computing of that sort. It's kind of neat.
HS: Yeah, Mary has another question here and she says, "Are higher education institutions in a unique position to create peer to peer strategies?
DA: Well, any organization, whether it's a university or a company, has the property that there's administrative control. Whoever owns the computing laboratory with 1,000 PC's in it can mandate that a peer to peer program be installed on all of them or removed, as the case may be. So there's the ability to enforce that something be done. Yeah, one of the interesting things with projects like Folding At Home and SETI At Home is that we can't force anybody to run these. It's all just on the basis of convincing people that we have meritorious research.
What really interests me is not so much small groups of computers in universities or companies, but rather tens or hundreds of millions of computers on the Internet. I think that in the long run, the only way to get access to those is to convince people that you're doing something interesting or worthwhile. So long term, one effect on research may be that if you want a lot of computing power, you have to be able to sell your research to the public. You have to be able to explain what it is, first of all, and get people to think that it's worth doing or important.
HS: So you need a story in Time magazine or Newsweek or something to get folks excited about this?
DA: Or just a viable marketing grassroots phenomenon like SETI At Home.
Topics
HS: What are some of the promising developments in the future of peer to peer computing or peer to peer networking? See any interesting things coming?
DA: I think it'll be real interesting when there are platforms available so that scientists who may know how to program but not be experts in putting together large software systems like SETI At Home are able to participate in this. And there's a lot of flux right now, a lot of people jockeying for position in terms of controlling these standards, but it'll be a very good thing when those standards exist and somebody who has interesting research to do doesn't have to go through all the pain we went through in SETI At Home to build a platform and the servers and the screen saver program, but can just plug that application into a framework.
HS: Okay, I think we're getting very close to the end here so I'm going to as the sort of very close to the end question that we often ask.
JB: That's good timing, Howard!
HS: Okay. And that is, what should universities be doing right now about peer to peer computing? Should they be building something, should they be-whether they should be building peer to peer applications or whether they should be building walls to keep peer to peer applications out? What should they be doing right now?
DA: I think that universities should be tracking the technological progress of projects like Condor and Globus, which is sort of a larger scale version of the same thing, and trying to make sure that paths of communication exist between the researchers in that university and this technology. One of the problems of technology in general is deploying it and putting it in the hands of the people who can make the most use of it. And there is the converse problem that you mentioned of dealing with outside things like Napster that have the effect of clogging up campus networks or tying up computer time, things like that, and that has certainly been a headache here at UC Berkeley, dealing with the network ramifications of Napster. And I think that'll continue to be a problem, even if it shifts from Napster to something else.
JB: David, it sounds as if, given the impact on campus networks, that researchers should be working closely with their IT folks if they want to start exploring this kind of a project. Is that fair to say?
DA: Yeah, I think it's important for universities to make sure that they have some kind of centralized IT function that both manages the resources of the university and also puts researchers in touch with evolving technology.
JB: Okay. Howard, are you ready for a wrap up? Do you have a final question or comment?
HS: I always have a final question here! We could go on, really, for a lot longer.
JB: I know!
HS: It's just really neat stuff.
JB: It is!
Topics
HS: Really. I'm really sorry we don't have another hour here, but since you're so involved with SETI At Home, do you think that SETI At Home is going to actually discover signs of extraterrestrial intelligence? And if you do think so, when do you think that's going to happen? Ten years? Fifty? A hundred?
DA: Well, I should say that I'm more or less the computer guy in the SETI At Home project, not the astronomer. So my opinions aren't worth very much.
HS: We'd like to hear them anyway!
DA: My feeling is that it's a very small chance in the two year span of SETI At Home, maybe at most a one percent chance of finding a signal. I think it's worth doing, both because of the huge importance of finding a signal - it's sort of like winning the lottery - and also because I'm just interested in the technology and making it available for other purposes. I guess I would say that I think that in the next hundred years, we have about a 50% chance of hearing aliens, but that's just a total random guess.
JB: That assumes that they are.
DA: Most people who study this are pretty confident there probably is life outside of earth, but even if there is, even if it has technology, there are real problems in detecting them because of the distances between stars and the amount of background noise.
JB: Okay, interesting. Well, all right, with that, Howard, is it okay to close now?
HS: Yeah!
JB: Okay. First of all-
HS: There might be extraterrestrials [inaudible] too.
JB: Hey, it looks like we won't have them on Tech Talks, anytime soon anyway.
HS: Oh, well.
JB: Oh, well! Be sure to plan - I'd like to invite everyone who's listening to be sure to plan on joining us two weeks from today, on April 5, when our special guest experts will be Calvin Lowe from Bowie State University and Karen Coyle from the California Digital Library. And our subject is "E Books and E-Shelves." And that's certainly something that I know I can use!
With many thanks to the CREN member institutions who support these Tech Talks and to Dell Computing for their partial support of today's event. Also many thanks to our Tech Talk expert, David Anderson; to technology anchor, Howard Strauss; to Terry Calhoun, our Tech Talk web guru; to Jason Russell, Gayle Terkeurst and the support team at Merit Network; to Susie Berneis who consistently transcribes those wonderful audio files here; and also a thanks to all of you for being here. You were here because it's time. Bye, David. Bye, Howard.
HS: Bye, Judith. Bye, David. This was great.
JB: See everyone in two weeks.
DA: Bye.
JB: Bye-bye.
END OF WEBCAST
Topics
[Top of Page]
|