Judith Boettcher [JB] |
Howard Strauss [HS] |
Kevin Morooney [KM] |
Research Computing and Linux Clusters
March 16, 2000
[Audio]
[Top of Page] JB: Welcome to the CREN TechTalk series for spring of the new millennium, and to this session on Research Computing and Linux Clusters. You are here because it's time to discuss the core technologies for your future campus. A special thanks goes to Myricom for partial sponsorship of today's TechTalk. Myricom supports environments in need of high speed computers and communication devices. This is Judith Boettcher, your CREN host for today, and I'd like now to welcome the technology anchor for TechTalks, Howard Strauss of Princeton. Howard is a well-known information technology expert with expertise in webs and portals and now Linux, almost -- right, Howard? HS: That's what I've been told. It's very fresh expertise here! JB: Okay. HS: I can ask the questions, I just can't answer them, Judith. JB: All right. Well, hey, that's a start! HS: Okay, I'm Howard Strauss, as Judith told you, the technology anchor for the TechTalk series of CREN webcasts. As technology anchor, I'll engage our guest expert in a lively technical dialogue that will answer the questions you'd like answered, and ask those very important follow-up questions. You can ask our guest expert, Kevin Morooney, your own questions by sending e-mail to expert@cren.net at any time during this webcast. If we don't get to your questions during the webcast, we'll provide answers in the webcast archive. In 1991, about a century ago in Web time, a young graduate student named Linus Torvalds at the University of Helsinki in Finland created a new operating system as a hobby. He named the operating system that he created "Linux" after himself and the Unix operating system it was based on. Linus had been fascinated by a small Unix system called Minix that he thought he could expand and improve upon. He worked steadily from 1991 until 1994 when he released version 1.0 of the Linux [inaudible] into the world. He made the source code free and available to everyone. That some graduate student wrote an operating system that he made available to the world would not normally be anything to write about. In every computer science and engineering department, there are students churning out operating systems, compilers, database systems and other information technology creations that they would be only too happy to have anyone else in the world use. But Linux was small, free, and a good enough variant of Unix to be attractive to a group of people who needed or wanted to control their own operating system. Some found it attractive simply because they were hostile to Microsoft, Sun and the other corporate software titans. Not surprisingly, Linux has appeared on campus -- as every new thing does, especially a new anti-establishment thing. Since it is free and available on the Net, no one on campus needed permission to grab it. As Kevin Morooney describes it, "Linux just creeps into your campus infrastructure." That was fine for awhile, until more users -- especially users less technically savvy than the original early adopters -- started using Linux. Those latecomers to Linux needed support from the central IT group. Suddenly this thing that had crept in at night was visible and threatened to be a lot more visible. When the number of Linux users grew, several companies such as RedHat, Motorola, IBM, Corel and Caldera, to name just a few, saw an opportunity to make money from it and offered special versions of it for sale that even included support. Linux also became especially attractive to the research community, especially those solving very complex problems that required large parallel computers. As a result, government, industry and university labs sprung up that used Linux for advanced research projects. Like a single butterfly flapping its wings that is supposed to have an effect on global weather, an obscure Finnish grad student, poking at computers as a hobby in Helsinki in 1991 is sending ripples through the entire IT community. If you haven't felt the impact of Linux yet, you will soon! We'll help you understand the Linux phenomenon, especially as it relates to research computing, and show you how you can enjoy riding those Linux ripples (rather than having them swamp you) on today's webcast of TechTalk. Judith? JB: Well, thank you very much Howard, and it's going to be exciting talking today about this new operating system coming into all the various organizations. And we'll find out when and how and where perhaps they come in with our conversation with Kevin today. Let me introduce Kevin, our guest expert today. Kevin [Morooney] is the Director for Graduate Education and Research Services Group in the Center for Academic Computing at Penn State University. We will talk about a couple of special projects that he's working on in the Numerically Intensive Computing group, including a project called the Lion-X High Speed Linux Cluster. And for those of you who are listening and don't see how Lion-X is spelled, you might be interested to see that it's "Lion" and then with a dash and then X. And Kevin, maybe you'll tell us why it's called that as a way to get started. KM: Well, we had a bit of an informal contest, as happens sometimes in computer office and information technology environments. For instance, they have to name a new machine or something like that. And we were just sitting around some day and the biggest guy in the group said Lion-X because it sounds like Linux and it's got Penn State Nittany Lions in it, so -- JB: There it is! KM: So we just ran with it. JB: So it's easy to remember, too, right? KM: Yes. HS: At Princeton, I guess we'd somehow have to work tigers. It's a very difficult job. You have an appropriate mascot, at least. JB: Well, Tiger-X. HS: Well, no, but then it wouldn't sound like Linux. JB: Right. HS: There's more constraints to this problem than you thought! Kevin, could you tell us what there is about Linux that makes it especially attractive to researchers like yourself and other folks who are using this? Why is this really so much more attractive than, say, other variants of Unix or Windows or Solaris or something like that? KM: I'd say that Unix in general is still the operating system of choice for the bulk of the researchers on our campuses here at Penn State. Linux is just really another variant of Unix to most of those researchers. I mean, most simulation-oriented principal investigators are really concerned about the performance of their machine, not so much with the operating system. And I think what Linux has represented to them in terms of clustering is a very low-cost and reasonably well-supported way to build distributed memory parallel computers specific to their departments or their research group's needs. HS: Kevin, you said this thing is being used on a lot of parallel computers. Could you say a few more words about the kind of machines this is being used on? KM: Well, there are Linux clusters in the spectrum of information technology. There are some very small Linux clusters that exist on our campus -- and I'm sure on many other campuses -- that are in the two, four, six, eight node range (where a node would be a single host connected with, say, an Ethernet hub even, not even necessarily an Ethernet switch). And then this notion has been scaled up to support very, very large clusters operated by the Department of Energy at Sandia National Laboratories, the Los Alamos National Laboratories, in the ASCI program from the Department of Energy -- the Accelerated Strategic Computing Initiative -- where they're operating clusters that are in the 256-node range. All of those 256 nodes being two-way symmetric multiprocessors, so there's actually 512 CPU's. So the notion of Linux clustering can take you from this very small departmental resource to some of the larger computers operated by federal labs. This phenomenon is also being played out in the National Science Foundation PACI program (Partnerships for Advanced Computational Infrastructure) and can also be seen in the Department of Defense labs as well. HS: Kevin, you said that one of the reasons -- in fact, I think the main reason you gave us for people using Linux was that it was not very expensive, namely free, I guess, in most cases. But it seems like we're putting this on quite expensive hardware. Why would we be so concerned about the cost? Are there other reasons that people are attracted to Linux? KM: Well, actually, Linux isn't on very expensive hardware, I don't think, in most of these cases. If you were, for instance, to look at those smaller clusters that serve the specific research groups and departments, in some cases, they're re-purposing older Pentium processors that they may have laying around, maybe Pentium II's in the 200 to 300 megahertz range. But they're looking for a need. They have a need that they need to solve, and rather than go out and buy software to run on it, they can run Linux on it and then have a plethora of software, whether it's web serving or those kind of things. But in the parallel computing environment, the programming libraries that are important to folks who do distributed memory parallel computing are well-supported and run well on these processors and under Linux. So it actually does represent a low cost solution because if they had -- you've already capitalized the hardware and now you've installed Linux on it. That didn't cost you anything. You could choose to not have it cost much. JB: Kevin, could you give us an example of one of the departments at Penn State and the size of the cluster that they have? KM: The largest one outside of our center is in the Mechanical Engineering department in a group called the Propulsion Engineering Research Center. They've eclipsed 75 nodes now. There's another one in the Aerospace Engineering department, operated also out of the Institute for High Performance Computing Applications, which I believe now is over the 50 host limit, which would put it probably in the 100 CPU range. JB: That's nice. What about the low range at Penn State? Do you have a sense of that? KM: The last time we checked in with the department of Meteorology, who also was one of the very early users of our Lion-X cluster, they had started out with just a two CPU cluster to sort of understand what it took to run something like this and to write and run programs on it. And I believe, the last time I had a conversations with someone that was tracking them, they were up to eight hosts at this point. HS: Kevin, you mentioned a distributed memory parallel computing thing that you were doing. Is this the kind of thing -- this kind of computer -- the kind of thing where you're doing most of your research projects on or that are being done at Penn State? KM: Well, we've been supporting distributed memory parallel computing -- HS: Perhaps you ought to tell folks what that is. KM: Sure. Parallel computing implies that you're going to decompose your scientific simulation, either from a data perspective or a work perspective, so that it can run on multiple processors simultaneously. And by distributed memory, what I'm implying is that the program is decomposed in such a way that different sections of the program -- or perhaps even just multiple copies of the same program, depending on how the algorithm is written -- run on separate hosts. So if you're just doing this on, let's say, a Local Area Network in your office, at Penn State, it could be host "A.psu.edu" communicating with host "B.psu.edu". So you've decomposed your work in such a way that you'll be communicating between those two hosts using TCP/IP to break down that problem. There's another style of parallelism that many folks refer to as SMP parallelism, or on SMP hosts, symmetric multiprocessors, where communication is done across a bus over shared memory. JB: Okay, so the TCP/IP is in fact the communication protocol that's used, then, within all of these clusters? KM: Most often, yes. JB: Okay. HS: We have a question that just came in and we have other questions that have come in, and we'll get to the them when we get to the part of the talk that's more appropriate for them. Some folks have got some more advanced questions. But this is sort of a basic question that we got in from Claudia Rivera at the University of Texas at El Paso. I'll give you her question, but I'd like for you to answer it more generally. She says, "Can you make a Linux drive to be seen on an NT station?" But I think the more general question is how does Linux play with other operating systems? KM: There's a handful of ways to answer Claudia's question. I'm also a personal user of Linux. It's not just something that we use to support research computing here. One way that you can do this specifically is you can mount -- if you were to have, let's say, a laptop or a single workstation, define a file system that is a DOS file system and you can mount it from Linux and see data on that DOS file system. And then when we're to boot NT on that machine, it would look like a D drive or an E drive or something. And you can share data between a single host that might boot both operating systems. In a more distributed sense, you can run Samba on a Linux server or become a Samba client and access an NT workstation who's doing Samba service. So you can play in distributed file system environments. HS: Okay, and that's true of just Microsoft stuff? Is that true with the Mac operating system as well? KM: Well, there are Samba implementations for Mac OS as well, so yes, I mean, you can have your Linux workstations or servers play in a Samba environment if you choose to. And Linux also has NFS as most other Unix systems do as well. HS: Okay, you talked a little bit about the name Lion-X. Maybe you could tell us, now that we know how the name was chosen, what it's all about. KM: Well, Lion-X started, really, several years ago as a partnership between both our Center for Academic Computing and the Department of Aerospace Engineering and Professor Lyle Long over there. We had bought eight 266 megahertz Pentium II's with a handful of memory on it and installed Solaris X 86 on it. And we were starting to dabble in this clustering with Pentium processors. We run a large SP computer here, so we understood how to operate a parallel computer but were curious about how to start doing this this relatively new way. And then that investigation sort of crawled along for a little while. We were learning some lessons. And then Dr. Long went off and built himself a large, successful Linux cluster in his group, and then we did the same thing over here at the Center for Academic Computing. Our cluster is composed of 32 processors. They're all Dell processors. We have one central server node so it makes for a total of 33 processors. Our compute nodes are 2A 500 megahertz Pentium III processors and each host has a gigabyte of memory. So each CPU, if you were to perfectly decompose your problem in a very symmetric way, would have access to 512 megabytes of memory. HS: Who's solving problems on these things? Are these graduate students, faculty, outside people coming into the University or -- KM: Mostly grad students and faculty. Just as it's been with the SP over the years. Our usage profile, we really have only been in production with that facility since September. We've been running jobs on it since about July. Our usage profile doesn't look all that different than it does on the SP and as it does at different national centers. A lot of calculations in chemistry, physics, meteorology, computational fluid dynamics. It's pretty much the same crowd that uses those resources year in, year out. The grad students' names might change, but the research generally--you know, the kind of problems you're solving doesn't change all that much. HS: What about undergrads? Are undergrads using this at all or are undergrads using any Linux clusters at Penn? KM: I'm not sure if any undergraduates have specifically used Lion-X. We have the SP has been used in undergraduate classes to help teach parallel computing. You know, we'll be doing the same thing with Lion-X, I'm certain, just because we have that relationship still with those instructors and with those courses. JB: Kevin, from what you mentioned, you've got a number of these clusters, sometimes called -- I guess you call them Beowulf clusters -- in a number of places across campus. Do you link those clusters as well with each other? KM: There's no formal linkage now. It's mostly been a social linkage where someone will run on their cluster and then they'll run something on our cluster. When they notice differences when they use the same communications substrate, they'll pick our brains and see, well, gee, maybe we're not running the file system the right way, the best way to get the best performance. And then other times (this is, I think, a kind of neat story) the department of Acoustics runs a 20 CPU cluster for itself, and what they had done is they had run some of their applications on our resource and one of the things our resource has -- the Lion-X resource has -- is three different networks that connect the compute nodes. We have gigabit Ethernet, Mirror Net and Fast Ethernet. And the folks over in Acoustics thought that they liked the performance of the Mirror Net -- thought it was important for their applications, and built their cluster around Mirror Net. They chose to make the investment in networking and they cut back on their investment on the node technology or memory technology. JB: So some of your research with all these clusters has been just what kind of networking works best for the within-cluster communications? KM: Right. One of the design points for Lion-X has been -- it's been relatively tricky to do and I think we're still having success doing it -- is that we wanted to build something that people could use as a parallel computing engine to do simulation and make a sandbox for folks to get in and play when they want to do this for themselves, or just learn. Because there are folks in our Computer Science department who are just interested in the underlying communication libraries and those kinds of things and how they behave. HS: You mentioned Beowulf clusters. What are they all about? KM: Beowulf clusters are clusters of Intel processors running Linux connected via fast Ethernet. To some folks, I mean, I've had some folks say, "Well, geez, your machine technically isn't a Beowulf" because we have these other networks. But a Beowulf and a Linux cluster -- I think, nine times out of ten when somebody uses either one of those terms, you're pretty much talking about the same thing. HS: Are there things that you're planning to do to Lion-X or Beowulf that are follow-ons to what you've done now? Are these things growing, expanding, turning into something else? KM: I think one of the phenomena -- it hasn't happened on our campus yet. I know from talking to some folks at different energy labs that the following has happened and I'm personally fascinated by it and we're looking for opportunities on our campus to try to execute this idea ourselves. Because you can build these clusters relatively inexpensively -- the notion of building an application-specific cluster comes to mind. It used to be, to build a 32 CPU parallel computer, ten years ago, would cost a big chunk of change. You can build a 16 to 32 processor cluster these days for not a tremendous amount of money and if it doesn't cost a whole lot, it's okay if it sits idle sometimes. So if you wanted to build a parallel computing cluster that did high speed rendering for real time animation, it wouldn't be too distasteful to the folks who control the money if that thing was only run during the times when we needed real time animation, for instance. When you buy a thirty million dollar computer, or even a million dollar computer, people like to see it run seven days a week, 24 hours a day and rightfully so. So this notion of application specific clusters is one that we'd like to pursue with -- HS: Do you have any sense of, if you were going to build an application specific cluster, are you comfortable with it running eight hours a day, six hours a day, two hours a day? When does it begin to look okay? KM: Depends on how much money you spent and, really, what the application is. If it were, let's say, something for data mining as it relates to genetics, that would be a very different design point or cost-benefit ration than it would be perhaps for someone who's using it to do real time rendering for an ImmersaDesk or a Cave environment. For those folks, I don't think it would matter as long as it performed when I was in the Cave to do the virtual reality. HS: When you were in the Cave? JB: A virtual environment. KM: Yes, and when you were simulating a virtual environment, you would want it to perform precisely when you needed it to perform and I don't think it would matter if it was idle 22 hours a day, if you needed it two hours a day. For the geneticists, it needs to perform when they use it, and they're probably going to use it more than just two hours a day. So I think the cost-benefit analysis is really based on the nature of the application and the nature of the research. JB: When you're setting up these distributed memory parallel processing environments, the software, the applications are -- obviously, sometimes they're available and sometimes they're not available. Are people using Linux primarily because of -- is it easy to build applications on top of Linux? Is there some characteristic about Linux that makes it easy to do that? KM: In terms of the support of parallel computing, the folks that are using Lion-X and the folks that are using our SP and even these larger ones and these smaller ones have written home-grown codes. They've taken either FORTRAN, C, C++ codes and are using a programming library called MPI, Message Passing Interface, and have decomposed their work and data using MPI and its underpinnings as well (it's more than just a library). But so all those codes are homegrown, but writing an MPI program for a Linux cluster or for an IBM SP or an SGI Origin 2000 -- I mean, there's always porting issues when you move between two platforms or two different centers. But they're nominal. JB: Well, in fact, one of the questions that John Wallace from Dartmouth had was in that area. He was asking about just could you describe your development environment, including compilers, debuggers, etc. And then also very closely related to that is, what support do you provide to users in the programming of the applications, etc.? KM: I'll answer the second one first. We've spent a lot of energy over the last seven, eight years on developing seminar series and having hands-on lab sessions and those kinds of things to help people who think that they're ready to bite the bullet to redesign their program to use MPI. So we have that effort and we've had that ongoing for quite some time. And I think Penn State has, relatively speaking, a fairly mature distributed memory parallel computing set of faculty. I mean, there's a handful of folks out there who have those kinds of parallel computers. And the first question was again? I'm sorry. JB: The first question was if you could describe your development environment. You had mentioned MPI and then he also asked, do you support PVM and open MP? KM: Clearly, we support MPI. That's the programming model or library that we support most robustly. We, in fact, do not support PVM today, but by the -- well, actually, it's almost the end of the week, isn't it? Early next week and towards the middle of next week, we're going to be looking at how easy it is. We're going to be [inaudible] run both at the same time because we do have a faculty member at our Nuclear Engineering department who requires occasional access to PVM. So we're going to try to find an easy way to do that. Now, open MP is supported by the compilers that we currently have on Lion-X. We have the Absoft compilers of the Portland Group compiler series. So we do support open MP. It is there. The availability of graphical debuggers and graphical programming environments is there. We haven't currently purchased anything on our cluster, so it's still print-up statements and what-not to debug the programs. Not very elegant. JB: Okay. Howard, do we want to go into the other questions? HS: I was hoping, Judith, that you would actually take Joseph Hecht's question, but I'll do it -- JB: Okay, you go ahead! HS: Because you [inaudible] before. Joseph Hecht from the University of Missouri actually has a question about the Beowulf clusters, and he says, "Obviously, the price is right for Beowulf clusters, and the technology is advancing with corporate sponsorship for things like NUMA in these clusters. But is the payoff there today in terms of the amount of people-time it takes to get an application or project ready to be able to utilize a Beowulf-style layout, as opposed to the more traditional high compute libraries where the language takes care of all the work of threading and distributing and all that stuff?" KM: I think the answer to that question is really application-specific and also research-agenda-specific. For some folks -- again, turn to the Department of Energy who have a very aggressive simulation agenda placed before them. Because of our inability to detonate nuclear weapons at this point, they have had to go to a new programming model to get to the performances that they need to simulate the kind of events that they're going to simulate. When you get back down to the department level and to the research group level, there are some folks over the years -- this has happened [inaudible] on our campus, where they'll come talk to us and ask about, "What do you think it would take to port my program to use MPI or PVM?" And we'll have a discussion and they'll decide sometimes to buy themselves out of the solution by buying a bigger box that has, perhaps, more CPU's, it supports an SMP model, or that there are programming libraries that will just take advantage of the distributed memory over the bus. And that serves their needs for another two years and they continue with that cycle. You see some Department of Defense centers -- and Department of Energy centers as well -- that still invest in, not necessarily NUMA, but it is the more classical style of supercomputing because they've made the determination that porting these critical applications just isn't there yet. So there's always going to be a place for NUMA architectures and open MP/SMP parallelism. It's just an issue of what's right for you and what kind of investment you can make now. I think it has shifted a little bit, though, in that it is very inexpensive cost wise -- not necessarily to run, but cost wise -- to get eight Intel processors connected via Ethernet that are isolated so that their communications aren't dealing with collisions on a Local Area Network, with e-mail and web serving and all that. And it's an environment that you can control. It's not like you have to make this software development investment and then wait in line at a computer center to have your code run. So I think it's shifted a little bit. HS: Kevin, before we get to the next e-mail question (and we have a couple of them waiting here), you mentioned that government labs were doing something with Linux. Are there government labs that have some big commitment to Linux that you folks get involved with? KM: Well, in the Department of Energy, we don't necessarily get involved with them but it's very important to watch them because they have deeper pockets than most of the other centers that we pay attention to. Our model here at Penn State has always been to try to understand best what people are doing at their desktop and in their department, and at the same time, pay very close attention to what Department of Energy, Department of Defense and National Science Foundation is doing so that when someone migrates or grows up from their departmental resource, that if we are going to run a resource here, that it enables them to port their application to our resource in such a way that that porting effort will minimize productivity. It's timed to productivity when they go to these other centers. Argon National Laboratories has a very, very strong and powerful presence with Linux clusters as does, I think I mentioned before, Sandia, Los Alamos, the usual suspects. HS: Yeah, the real big guns. KM: Yes. And the National Computational Science Alliance, one of the PACI centers for the NSF, has a Linux cluster representing them at University of New Mexico called the Roadrunner Cluster. And the Ohio Supercomputer Center is also running a 128 CPU cluster from SGI -- Linux cluster from SGI. So different flavors as well. JB: Okay, in fact, your comment about all the different flavors brings us to a topic I think we want to talk about. HS: And to Claudia's question, in fact. JB: And Claudia's question. Why don't you go ahead, Howard? HS: Okay. Claudia Rivera, who's appearing here for the second time in this webcast, from the University of Texas El Paso, says, "Could you please give me a brief explanation of all the flavors of Linux?" Well, probably you don't want to go -- KM: Whew! HS: --and give us an explanation of all them, but I think her point is that, gee, there seems to be a lot of those things out there. She says they're planning to migrate to Linux from Power PC Solaris and NT and she wonders how she decides what the best choice of all the flavors of Linux out there are. JB: Yeah, and I think that it links into the other topic that we want to talk a little bit about was, you know, how easy is it to get started with Linux? You had said that it was creeping in, so how easy is it? And if people are creeping in, what are they doing and which choices are they making here? HS: So we created three questions now and we're just going to go -- JB: Right, 35 questions! KM: [inaudible] Claudia's question and the last one. In terms of sort of -- it would be impossible, as Judith alluded, to digest or talk about -- HS: Well, I looked out on the Web. I was absolutely astonished at how many different things, how many different flavors of Linux were out there. JB: And there's both the free versions and the commercial versions, right? KM: Right. HS: But how, I mean, how do you figure out which one to use? KM: Try one. And see if you like it. HS: But you'll be there forever! KM: Probably. HS: Do you have a recommendation or two to make? KM: What's that? HS: Do you have a recommendation or two to make? KM: Personally, I started with the RedHat distribution. HS: That's one you pay for, right? KM: Well, you don't have to. You can FTP it from their website. You can either choose to buy the CD or you could FTP it. I chose, on my first -- HS: It depends on whether you want the bits or a piece of plastic. KM: Yes. And the plastic, you know, you'll probably also buy a little bit of documentation, which I did, so as an individual user, I started with the RedHat distribution. But someone else, not in my group but somewhere else in the building had started messing around with it and I had heard about RedHat, so I thought, okay, I'll try that because he had tried it. I didn't do a serious investigation here. I just thought, all right, it appears to work, I'll give it a shot. HS: How different are these, one from the other? I mean, if you did pick one almost at random, how badly could you go wrong? KM: Not very. It really depends on what you're looking for, but I think there's two kinds of people in the world, and with St. Patrick's day coming up, I won't mention that one. But there's two kinds of people in the world, you know, those who it really matters to them which distribution they have, and the other one is, when I type "LF" it shows me a listing of my files. If you're the kind of person who's really just looking for a Unix environment to operate in because that's what you're comfortable with, it's probably not going to make a tremendous amount of different to you which one you actually run. Another sort of sub-issue here to perhaps discuss is if you have a machine that currently is running Windows and you want to install Linux on it, that's one issue. If you buy a machine that has Linux pre-installed in it, that's another issue. Installing Linux on a machine now is not as easy as it probably should be for those who are most fervent in the Linux community for it to try to dominate something like the desktop. It's still -- you know, when you install Linux on a machine, there's some times you have to get down and dirty and roll up your sleeves and learn more than you probably wanted to when you just wanted to try this thing out. One thing that's very good about that, if I can very quickly -- there's another nice phenomenon of Linux is these things called Linux users' groups, and there's thousands of them across the country. And the different bulletin boards and what-not, that's how I got most of my questions answered by just sort of peering in on those conversations and understanding, oh, this is a typical problem, this is what I have to do. It's a very helpful community. JB: With that, along with that, Kevin, I noticed that as we were talking before that you mentioned that some of the major computer manufacturers are starting to offer Linux as a pre-loaded system on their systems. Have you had experience with that, or would you recommend a way to get started with using that? KM: I haven't personally, and no one in my group has bought a pre-installed Linux system. I know that Dell computer enables you to buy, I think it's a RedHat distribution. VA Linux is another company that sells, when you buy a computer from VA Linux it comes with Linux installed on it. HS: But your suggestion is that since it takes some technical expertise to install Linux, if you don't feel comfortable doing that or if you don't have the technical support near you, it would be a good idea to get something already installed. KM: That's -- yes, but also, trying to install Linux, most of the installation procedures now from the RedHats and the [inaudible] and the [inaudible] of the world protect you from really blowing anything away on what you currently have. If you're the slightest bit adventurous and you're interested in it, I'd spend the ten bucks down at Computer World or wherever you have to go because you can get these things from your local vendors as well, and try installing it and see how it goes. JB: We did have another question that comes in that perhaps it might now be a good time to insert it. We have a question from Richard Danielson from Laurentian University, and he's asking, "Can you think of any reason why someone might want to put a Linux cluster on their home network?" Have you done that? You're, perhaps, a good candidate, right? KM: My machine is dual booted at home. I have three kids, one of whom prefers Linux, the other two prefer Windows. The other two are still very game-oriented, but running a Linux cluster on a home network, I actually have several friends here at Penn State that have multiple Linux boxes running at home. JB: And why do they do that? KM: I don't know. Because they can. Because they do. They'll maybe have a cable modem connection and they'll run some different protocols there on that machine to help it connect to the other machines. And it's no different than why I might. I might want to have a computer in my son's room or my daughter's room and a computer in the kitchen and [inaudible] how to network these devices together. HS: [inaudible] the kitchen, but what kinds of applications are available on Linux? Is there everything that you could imagine, or are things just sort of popping out one at a time here? KM: In the public domain -- well, I'm not a lawyer so I won't go into the nitty gritty of the new public licensure. HS: We've had lawyers on TechTalk. KM: In the public domain, there is just about anything you could imagine that's out there. And from my experience, going to a website like FreshMeat.net or regularly reading slashdot.org, the quality of the software is phenomenal. It's just amazing how well written it is and how easily it installs and how well it works once you get it going. A great example is GMP, which is a lot like -- you can make just about any picture in the world you want to with something like GMP. So in the public domain there's a tremendous amount of software out there, calendaring programs and what-not. Recently, as it relates to our efforts here to support research computing, there have been some fairly major announcements in the last three to six months. Most recently SAS Institute has announced that they are going to have SAS products available within the next three months, I believe, in that near a time frame. MSC Nashtran, which is a large package that has pre-imposed processing for solid modeling and finite element analysis, already has a Linux product out there. It's the airline industry, the auto industry, are big users of Nashtran, as is the academic industry. Fluent, Inc., they have a handful of software offerings, but the one that we use here at Penn State is called FIDAP. That is also available on Linux. Data Explorer, which is a high powered visualization program written by IBM, is actually now open-sourced. It's not available to purchase for Linux, but rather, you can go get the source code to Data Explorer and run that on your Linux box was well. MatLab is available for Linux, which is a symbolic math package and I believe Oracle even has their database, I believe, available for Linux. HS: Yeah, in fact, I just saw when I was looking for Linux things on the Web, I saw that Corel has all their products now running on Linux. KM: Yes, Corel has an office suite, as does -- Sun Microsystems bought Star Office and has Star Office which is an Office look-alike kind of thing. So yes, there's a lot of tools out there. JB: Kevin, a problem on many campuses is the challenge of supporting all these various different operating systems. How are you supporting it at Penn State and should campuses be planning on having to support Linux? KM: I think campuses should be planning to understand how -- should be understanding how Linux is creeping onto their campus. I think here at Penn State and perhaps other places, we're still at the stage where we're trying to keep up with the Linux folks. I mean, its initial entry point is going to be with a very self-reliant, aggressive, adventurous crowd. So we're definitely partners in this sort of larger-scale investigation which the Linux [inaudible]. But as you mentioned in the introduction, there is this newer phenomenon -- and we've seen this at the Penn State Linux users' group here just with the traffic on that listserve -- that some very uninitiated folks are installing it and trying it and -- I don't know if they're having fun, but they're having some questions. And they're very different questions than what we perhaps would have seen a year ago. So I'm not sure necessarily how you prepare for that, but I think gathering expertise in your organization to help these things like these Linux user groups, again, it's a very self-reliant crowd. I think getting in and being a part of that crowd is one way to do that. And to help galvanize a Linux group. HS: Right, but as you mentioned, and as I mentioned in the beginning, the phenomenon that we've seen is that first it creeps into a few researchers' desks, but these researchers live near people on campus. Their offices are nearby and somebody pops in and says, "What are you doing?" And they say, "I'm doing Linux," and they say, "Oh, that sounds nice! Let me do it!" And they're less technically savvy and begin to require some support. Can the regular support group, the regular Unix support folks deal with this? Do we need special expertise on campus to do it? KM: I would treat Linux as another Unix variant, depending upon how a potential IT organization is doing external Unix consulting or Unix support consulting. Typically, from what I've seen, a shop will have a Solaris expert and perhaps an Irix expert and an AIX expert, or ask someone to double up on it or something. And I think you need to point a person at Linux expertise, particularly as we already talked about this, a lot of different distributions out there. JB: We had a couple of other questions that were a little more technical in nature, Kevin, but I think now might be a good time to address them. They have to do with user access to the clusters, and maybe perhaps clump them, just like our users clump them. He's asking questions -- and again, this was John Wallace from Dartmouth -- how is resource allocation handled for the cluster? And how do users submit jobs to the system and how are they authenticated? So it's that clump of just how are folks, researchers accessing -- it must be the general cluster as opposed to those that they would have in their department. KM: Well, currently we support secure shell connections which most of the big labs do, so as opposed to perhaps telnetting to a resource, you'd issue it a secure shell command. Now, that's how you physically connect to the resource. Resource allocation, we've chosen the portable batch system which was developed at NASA, which is an offshoot from a lot of their work years ago in batch systems. They tried, I think successfully, to make the Portable Batch System (or PBS) work in such a way or be developed in such a way that it could run on just about any high-performance computer. So we've chosen PBS. There are others who use things like Low Sharing Facility (LSF) for [inaudible] computing, which is a product that you would buy. PBS is also in the public domain. You can use DQS. There's two of the clusters on campus that use DQS, which is the Distributed Queuing System which was developed at Florida State Supercomputer Computations Research Institute. There's many different ways you can do research allocation or sort of queue management on your cluster. PBS seems to be -- it's a sweet spot for a lot of clusters out there, which is the reason we chose it. There seem to be a lot of the folks that are on the important listserves and majordomos out there seem to use PBS and we wanted to take advantage of that expertise that already existed. For authentication, right now we currently do standard Unix authentication, but there are pluggable modules that we can use to tie -- and we are going to do this -- to tie them into our enterprise wide, university-wide K5 based environment. HS: That's Kerberos, when you say K5? KM: Yes, sorry. HS: That's okay. Just expanding your one letter to a word. KM: That's correct. HS: Who controls Linux? It sounds like there's all these versions of the thing around. There's the RedHat version and the Caldera version and every other version, yet I keep hearing folks say, "Oh, there's version 4 or version 6 or whatever." How do these people coordinate this so that there really is version -- you know, a different version from Honeywell and RedHat or whoever? KM: Well, yes, all the vendors have their own label on Linux and as a result, have their own versioning. And oftentimes it becomes, I think, a marketing race to see who can come out with their version 7, 8 or 9 first. The basic Linux kernel, however, is the code developed by a relatively small core of experts still led by Linus Torvalds. But there's many contributors worldwide who help with different device drivers and different specific parts of the kernel that relate to, perhaps, file system management and those kinds of things. HS: Are these folks doing this on their own? I mean, they're doing this because this is fun to do or they're doing this because they're getting paid to do it, or how does this thing work? I mean, I know that Linux has open source code so everybody can see the source code. Are people just going out and grabbing pieces of it, or is this coordinated in some way? KM: It is coordinated, but relatively loosely to what perhaps you may see in a corporate environment. If someone sees a need and thinks that they can serve that need, then they'll start serving that need as best as they can and then start working with the more core set of developers to get what they've developed into the basic kernel tree. JB: We had an e-mail when we were talking about authentication, the other half that always comes in, and that is security. Does Linux suffer from some of the same concerns towards security as some of the other Unix operating systems? KM: Well, you know, all operating systems have security problems. JB: So no one's singled out here! KM: Yeah, [inaudible] you think you're safe, you're not. The fact that you think you're safe, you're dead! Should always be very afraid. JB: If you're feeling comfortable, obviously you haven't heard what I've been saying, right? KM: Yes. HS: But is Linux better or worse? Are there some special problems or some special reasons? Strangely enough, I heard Kevin Mitnick, who is the hacker who spent, I think he said 59 months in jail -- that's a lot of time -- they were saying that Linux was much more secure than other systems because of the open source code. A thing that I've heard other people say that, no, open source code makes things less secure. Do you have any feelings about that, or do you think it really doesn't matter? KM: Well, I think it matters. I didn't see specifically what he had said. HS: It's actually on the Web, as is everything. KM: I think the notion that he may be referring to, though, is the fact that the source code is out there does enable bad guys to exploit holes, but it also enables good guys to see holes before they're exploited. HS: I think that was his point. KM: And no other -- well, there are other open operating systems, but when you have the open source like that, those things can be detected before the bad guys get to them. And also, if it's a problem and there's a handful of people who can fix it, then they will fix it. Or you can choose to fix it for yourself before it gets into the standard kernel distribution. You can see, "Oh, geez, I've heard about this problem, this guy's posted a fix. I will buy the patch, recompile, I've got a new kernel, it's safe." The time to be protecting yourself more, at least, than you were before can be significantly reduced. HS: Okay, but doing that kind of thing takes some level of expertise to [inaudible] kind of thing. You really have to stay up on this thing. It's not like I go to sleep and I wait until somebody bangs on my door and says, "Hey, get a new release of the operating system!" KM: Right. You do have to pay attention to it, but I think sort of the IT community at large, I mean, I look at -- I run Windows 98 and Linux at home and I probably haven't applied all the patches I should be to Windows 98 as well. I mean, I don't think we're doing a good job of training ourselves or just training the ubiquitous masses like my mom to, like, "Hey, mom, you should be checking out�" HS: Are you going to have your mom using Linux? KM: I'd prefer to because I could actually do some remote systems management for her because when she has a problem sometimes, it would be helpful. HS: If you're listening, Mrs. Morooney! KM: Oh, God! [inaudible] HS: Kevin has some interesting things planned for you! KM: One of the challenges with security, though, is that when you have newer folks trying it, they may not understand what at Unix we would usually understand as, "Oh, geez, that's a well-known thing you shouldn't do!" Well, it's only well-known if you've been doing it, so as Linux stops creeping and starts marching sort of, in either your organization or your campus, I think that can be an issue. JB: Okay, you know, our time is just running away very quickly on us and we always try and talk about just a little practical question at the very end, Kevin. And we had some questions saying, what about campuses? If they're not using Linux now, should they start? If they're not doing parallel computing, should they still be interested? Just who should -- which campuses perhaps should do this and why? KM: Well, I think just looking at it from a statistical point of view, the larger your campus is, the more likely it is that Linux -- if you don't know it's there, it probably already is there. Again, it's either going to creep into your dormitories or it's going to creep into your research buildings, like Howard was saying. The best way to prepare or, I don't know, prepare or get ready, but gaining expertise in your central IT organization is the best thing you can do, just by enabling someone to install on a machine or just letting it happen. JB: So you've really encouraged just supporting people if they want to experiment in this direction, to let them go ahead and explore and support in the background, then? KM: Yes. JB: To start with. KM: And I think it's a very healthy way to do it because that's how most people get started. You'll be just like one of them. I don't think -- you don't start doing Linux on your campus, Linux starts entering your campus, and it is to a certain extent a reaction. JB: Okay. Howard, do you have a final question or comment? HS: Yes, I do. Judith just said that we try to end with a practical question, and I have a question that's not practical. I think we're going to -- KM: Leave my mother out of it! HS: And it's something that I'm curious about and I think other folks listening might be. And that is, what's Linus Torvalds doing now? You said he still had some hand in the development of the Linux kernel, but I also heard something about him starting some dot com startup or something. Do you know anything about that? KM: He's involved with a company called TransMeta.com and I don't follow him very, very closely, but as I recall, the company is building hardware that runs Linux that'll enable -- which a lot of companies are doing -- handheld devices and sort of this ubiquitous network environment where they would be running Linux and there's some emulation capabilities of being able to run Windows code on this hardware as well. But I haven't followed it terribly, terribly closely, but yes, he's still involved in kernel development and kernel management. HS: And also doing this dot com kind of thing. KM: Yes. HS: See, we're really trying to change TechTalk into Wall Street Week and this is a subtle way of doing it! JB: Okay, great. Kevin, our guest expert usually has one final chance to say, if you want to have any final comment for our audience out there. You don't have to. KM: Not particularly to the audience, but I'd just like to say thanks to both of you for inviting me. I've been a listener, and it's been very interesting to sit on the other side, and it's been real fun. JB: All right, well, thanks very much. I will go ahead, then, with some closing notes here. I'd like to invite everyone to set aside time on their calendars for two weeks from today on March 30th for a session on Preparing for Campus Portals. We have our two guest experts, Christine Geist from the Rochester Institute of Technology and [inaudible] Wagner from the City University of New York. So we invite you to join us then. Many thanks to all of the institutions who help to support these TechTalks. We invite you and your institution to help support them by becoming a CREN member. Special thanks to Myricom for partial sponsorship of today's TechTalk, and thanks to everyone else who helped make this event possible today: to our guest expert, Kevin Morooney; to our technology anchor, Howard Strauss; to Terry Calhoun, event page producer; to David Smith and Patty Gaul of CREN; to Julia O'Brien, Jason Russell, Carol Wadsworth and the whole support team at the Merit Network; to Susie Berneis, audio file transcriber; to Laurel Erickson, transcript editor and indexer; and finally, a thanks to all of you for being here and for your very good questions. You were here because it's time. Bye, Kevin. Bye, Howard. Take care, we'll see you next time. HS: Bye, Judith. Bye, Kevin. Thank you. KM: Bye, now.
[Top of Page]
|