CREN - Network Management: Basics

TechTalk | Virtual Seminars | Glossary

Campus Communication Strategies Transcript

Network Management: Basics

Ken Klingenstein
Director of Computing and Network Services
University of Colorado, Boulder
ken.klingenstein@colorado.edu

As networks have grown in complexity and importance, so too has network management. There is a lot of material to cover here and not a lot of megabytes to store it on, and so this presentation will be at the 50,000 foot level of network management. We will focus on some of the particular tools and best practices in the second part of this seminar.

In modern networks, we want to consider a comprehensive view of the services offered. There is a physical core of equipment, of fiber and wire, of routers and switches and modems to connect the off campus and asynchronous user into the fabric. Above the physical core is a layer of software intended to make the physical core usable by applications. This network layer includes domain name service, routing algorithms, etc. Another functionality that we find at this layer is the LAN operating system, intended primarily to facilitate resource sharing such as files and printers among a small group of users. One other major component of this layer is the constant monitoring of the network for adequate performance and then re-engineering those components of the network that represent today's bottlenecks. For the network layer to work requires a layer of glue above the network, and we'll call that the enterprise layer. It's a set of services that provide a global environment for users and applications to function in. Typically one finds security services, services that give you a network identity, services that permit that network identity to utilize various databases and resources out on the network, a file system that spans the entire enterprise to not only allow users to access their information regardless of their location but for us to provide backup services for the desktops that constitute the user base. Other services at this level may include enterprise-wide printing, a definitive e-mail service, the mother of all Web service for the enterprise, and other communication utilities such as NetNews. Cutting across all of these layers is a set of organizational issues and operational issues that need to work hand-in- glove with each of these layers.

Typically, we would find within operations concerns about a network information center that can train users on how to use these tools, a comprehensive network operations center that provides complete diagnostic, maintenance, and repair services, trouble ticket systems that would facilitate those diagnostics and repairs.

If you would crack open a textbook on classical network management, you would find the information organized more topically than functionally; and the topics that are covered in classical network management include faults and outages, configuration of the network equipment such as routers and switches, accounting for resources such as disk space and printing, security of the LAN environment, and as we said earlier, a constant issue of performance and reengineering. While this organization works from a textbook case, we want to present a more holistic and comprehensive view of the network management environment.

The physical core consists of the desktop wiring, the vertical fiber within buildings, and the interbuilding fiber. Connecting those physical elements together are a set of electronic components, such as routers and switches, that require inventory, management, diagnostic services, etc. So, across this fabric of physical media and network electronics, we need to have a set of maintenance policies, diagnostic services, and repair approaches to make the physical core whole. Beyond that, we are learning today that we need to promote institutional standards for those components that are part of the physical core but not part of the central operation. For example, the cards within computers, the network information cards, have to be standardized in a modern campus environment so that the cards will work when a computer moves from its location close to a phone closet to a location father down the hall. Lastly, at this physical core level, there are a set of LAN Operating System/LAN Contention sets of issues that we have to focus on. Together this is called the MAC layer, or the Medium Access Control layer of the network.

What kinds of activities typically happen to manage the physical core? We have to diagnose outages. Worse than outages are intermittants, those problems that seem to crop up when you're not looking and when you apply the network analyzer, go away. Secondly, there is the continuing need to upgrade the location of the current bottleneck, and so you will discover the need to move routers into smaller buildings and buy larger routers where the traffic load warrants. There will be the need to convert from shared Ethernet environments to switched Ethernet environments, again to handle load factors. A third component of managing the physical core is to do an ongoing analysis of the traffic on the network. That analysis can happen at several layers. You can look at the packets per second being passed across the network. You can look at the megabits per second of traffic. You can look at your utilization of capacity. We are also now interested in higher levels of analysis, where we determine what protocols and perhaps even what applications are dominating our use of the network so that we can better tune our environment to handle the actual traffic load being offered. As we talk about the need for institutional standards, it's important to understand that we need to construct formal development processes that create those standards. Those processes need to reach out to the clueful people on our campus and involve them in the standards development because those are exactly the people who will be called upon to toe the line of the standard and to spread that standard outward within their departments. The standards promulgation and enforcement policies are a final step in having effective standards to manage our complex environment.

What kind of approaches do we use here at the physical core? Well, certainly network analyzers are a major tool, as they historically have been. In addition, there's a set of tools that focus on working with the physical media such as Time Domain Reflectometers. Then there's a set of protocols and software tools that we use increasingly to debug problems in our network. Most conspicuous among these are SNMP, the Simple Network Management Protocol; a new remote management protocol called RMON; and then a network management system, or NMS that seeks to wrap all of these particular elements into a comprehensive environment. One last tool that we wish we had in better shape today are trouble tickets that would facilitate the operations of repair and diagnosis.

How do we measure how we're doing? Well, one way is to measure our mean time between failures, or MTBF. How robust is the equipment? Typically, you may find in some routers today that the mean time between failure is actually in the several year category, and that exceeds a commodity that we call mean time to obsolescence, which may mean that the router never fails during its useful lifetime. But when it does fail, what is of considerable importance to our users is that we have a good mean time to repair, or MTTR. Typically, we aim for repair cycles of no more than two hours for major outages. One other metric of how you're doing in maintaining the physical core is how much staff do you have? One of the Holy Grails of management has been to try to determine a ratio between the number of machines you have on your network and the number of network management staff. It seems, though, as we get further into this business that the important relationship is between the number of participating departmental gurus that are out there and central staff, because we've looked to hierarchical staffing arrangements to adequately support our complex networks. Another useful approach is to adopt a comprehensive set of desktop standards. These standards begin with the network interface cards that connect the desktop computer to the communications environment. The standards can also extend to control the protocol stacks that control those network interface cards. One useful criteria is to standardize on those pieces of software that the user never sees, that is, those pieces of software which serve the applications and the operating system rather than the end-user directly.

There are a number of key issues still outstanding about managing the physical core of the network. One of the most important issues is where to put intelligence in the network and how to use it. For example, one can put a monitoring tool such as SNMP inside every port of an Ethernet hub. In doing so, one drives the cost per port of that hub from $100 per port to perhaps several hundred dollars per port. That might be a worthwhile investment if your monitoring environment allows you to tap into that intelligence. On the other hand, if you have more primitive sets of tools available to you centrally, then that intelligence in the hubs will go largely under-utilized.

Monitoring itself has proven to be a complex task. One important issue is data management. Previously in an environment I was in, we were collecting 20 megabytes a day of network management data for our regional network. The full use we made of those 20 megabytes is to erase them at the end of each day. We had no tools in place to meaningfully use that data. If you are collecting data, you need to apply a set of heuristics that will help determine whether a network outage report should be responded to immediately, or could it be delayed for several hours? For example, if you have automatic paging in your monitoring system, at 4:00 in the morning, do you want to page your networking staff for an outage that consists of one or two machines going down? Probably not. However, if the outage involves an entire campus subnet, you do want to get staff notified as quickly as possible. One of the tools that helps the operation of heuristics is the have scripting services that can make routine the operations which one frequently performs on monitored data. Lastly, it's very important in monitoring to have established what the normal operation of the network looks like, baselines, so that when oddities appear, you know what it should look like in a normal environment.

As we get more and more equipment into this physical core, issues arise about how to do spares and physical maintenance. We've moved from an environment where we keep wholesale routers to back up our existing routers, to an environment where we keep components of routers and we replace just those components -- fans, cards, power supplies, etc. -- that fail. This makes maintenance more cost-effective, but adds a certain degree of complexity to the management of that hardware. Once one has decided on what and how to spare, you need a philosophy about what to do with the equipment that has broken. Do you repair it in- house? Do you drop-kick it back to the vendor? Do you send it on a more routine cycle out for repair?

As the environment becomes more dense, we need to make sure that our tools are integrated; that when a trouble ticket is issued, we can quickly call up the equipment that is being used in the transit of communications; that we have databases that tell us what the end-user has on their desktop to facilitate the diagnosis and repair of the trouble.

The standards process that we alluded to earlier is a particularly difficult effort. Universities are based upon principles of free choice. In that regard, it is often good to think about creating two standards. If you go after one standard, typically you will wind up with half a dozen. If you set your goals on two standards, you may wind up with at most three or four.

Regardless of how many standards you have, you need a uniform way of dealing with exceptions to those standards, and that policy process is essential. In our layered environment, central responsibility for the network only extends to clear demarked points within the fabric of the network. Beyond that demarked point, responsibility may belong to a departmental staff person or to the end user. It is very important to have a clear understanding with departments and end users where central responsibility ends and departmental responsibility begins.

Lastly, in considering management issues for the physical core, it's important to set expectations of the users correctly. Protocols will result in performance of the network not being up to the expectations of the user, but it is not a network outage; it is merely misplaced expectations. In a major experiment several years ago, a national file system was established among several universities. Users did not know that their files may be transferred from a location 3,000 miles away versus a server down the hall, and users in that environment complained about poor response time, when in fact they were getting fairly good response time, given the fact that the file server was 3,000 miles away. It is important to correlate the expectations of the user with the capacities of the environment.

Above the physical core of the network lies a layer of software that allows us to utilize that physical core. Some of the software is for the actual management of the physical network.

For example, DNS (Domain Name Service) is responsible for converting the names that users type in from machines into IP addresses for routing. Routing, in turn, takes those IP addresses and figures out the path the packets need to take across the network. Multicasting represents a class of applications and tools which seek to spread packets from one source to multiple sources. Efficiency there is a key issue. Lastly, addressing tools are available to network managers to allow them to decompose a physical network into logical or virtual subnets that would group users of like categories together. A second aspect of the network layer software are the LAN Operating Systems that many of us are familiar with, those such as Novell, Workgroups for Windows, the Network File System, etc. A LAN Operating System is oriented around sharing of common local resources such as printers and file systems. At the network layer, we also constantly monitor the performance of the network, looking for the location of today's bottleneck, and reengineering the network to move the bottleneck further down the pipe.

Domain Name Service is a key component of the network layer. It maps machine names into IP addresses. Because Domain Name Service was constructed in a hierarchical fashion, and with a great deal of power, its use has been extended to enable us to do aliasing of machines and service. This permits us to give a common address to a group of machines and allows the user to address those machines without knowing which particular server is providing the service. Another major feature of Domain Name Service is to create subdomains for individual departments to work in, and to delegate authority for matching machine names into IP addresses to those individual departments. In short, Domain Name Service has become a key point in the network layer for controlling security and managing the environment.

Routing is the process of determining a path between IP addresses. Routing tends to break into two fairly distinct quadrants. Interior routing tends to be concerned with routing within a particular domain or a particular campus. There, the issues that govern our operation of routing includes getting adequate performance, providing redundant paths on campus, and building a routing architecture that secures key systems from hacking. At the same time, there is a need to have the campus routing, the interior routing, function in the exterior world. When we turn our attention to exterior routing, however, the issues are no longer performance as much as politics and reachability, that we determine paths to our distant connections that are appropriate for our institution, for our contracts that we've signed with external service providers, and that those external service providers in turn have a set of agreements that need to be met through exterior routing protocols. In both interior routing and exterior routing, there has been a tremendous amount of complexity over the last several years, and it is likely that it will remain an art rather than a science for several more years to come.

When IP was first created, there was only one class of addresses; Class A. When it became clear that the Class A intention, which was there were going to be in the world a total of 256 networks, each of which had a large number of machines, that that model did not jive with the emerging Internet, where there were far more than 256 networks out there and typically those networks held a reasonable number of machines. The developers of IP then evolved Class B and C addresses to take care of those granularity issues and a Class D address to address some multicasting concerns. Unfortunately, the granularity of A, B, and C does not work when you reach the campus level. Most universities have Class B addresses that govern their entire site, but in turn want to create subnets on the campus so that they can create logical networks that share a physical network. The tool for developing that has been called subnet masks, and they're a very powerful mechanism that will allow us, for example, to insure that communications within a single physical network may pass through a router, if only to build security firewalls through that router. To extend the capability of subnet masks, we've created variable length subnet masks so that we can have one quadrant of campus have subnets with a fairly small number of machines and other quadrants of campus to have subnets with larger numbers of machines on there. Variable length subnets add more complexity but add more control. One major problem with using subnetting as a management tool is that it requires the hosts connected to the network to be savvy about such issues. Typically, modern operating systems have that intelligence, but some of the older machines on campus and some of the older operating systems may not be subnet-savvy.

Broadcasting and multicasting refer to communication patterns where an individual workstation wants to reach many workstations. Typically, you would use that in a video broadcast arena. Broadcasting is also used by much of the network software to discover resources. For example, a machine when it boots may need to determine what its IP address is and will launch a broadcast packet to do so. A machine may need to have routing information; again, a packet is launched at large in the network using a broadcast capability. Because broadcasts can generate so much traffic, they can overwhelm a network and significantly deteriorate network performance. The major tool that we have for limiting the scope and impact of broadcast and multicast packets is configuring the router to not pass those packets. So router configuration is an essential part of making broadcasting and multicasting a viable set of communication tools.

Most desktop computers have two modes of functioning. In one mode, they're part of a Local Area Network. In their other mode, they are an Internet client connected to the campus Internet and then the larger Internet. The LAN Operating System component is primarily used for sharing files and disk space on a server and for managing printing. The LAN Operating System adds a layer of account management and a need for a separate set of security tools to monitor and manage the servers and printing environment. Another major element of network management within the LAN Operating System is the need to monitor resource consumption, such as disk and printing. Yet a last set of issues for LAN management includes locating files and executable programs in the appropriate space. For example, one can have consumptive applications running on the server and the user is only operating a local display of that program on their own desktop device. Alternatively, the entire program can be executing on the user's computer. Based upon these and other models, we have to make sure that the datasets being used by these programs can be found by the programs, that user preferences are stored locally for users to access, but have some kind of global environment so that if a user moves to a different machine, those preferences can move with the user.

So we see that at the network layer, there's a number of management needs. There's accounting for the LAN Operating System. Typically, the LAN protocols are different than IP that is being used to operate the campus Internet. In that case, one has to perhaps carry those LAN protocols across the campus Internet and drop them off onto a distant LAN if, for example, you're seeking files from a far server. The techniques used in that case are called encapsulation and tunneling, where the LAN protocol is encapsulated in an IP packet and tunneled across the campus internet to the destination. One of the more common problems in managing the network layer is to control the use of broadcast packets. Broadcast storms arise when broadcast packets are inappropriately received by workstations who, in turn, inappropriately respond to those broadcast packets by launching their own broadcasts. In a typical broadcast storm, you may see an escalation of packets within milliseconds that overwhelms the network. As networks grow in use and users, we have to constantly monitor the performance and reengineer those components that represent today's bottleneck. Another major activity at the network layer is managing those addresses so that we can construct subnets that meaningfully reflect the topology and user concerns of the desktop devices. It's also important to develop routing that is consistent throughout the environment to avoid routing loops where packets circulate endlessly without ever reaching their destination. One other major management need at the network layer is to sew together all of the operations from the network layer down through the physical layer. Typically, that overall NOC coordination occurs at the network layer.

When we talk about network layer performance issues, we discover that there's multiple definitions that can be used. Sometimes we're concerned with thruput. What is the total bits per second going through my network? Other times, we're more concerned with the delay. What is the typical delay in response that a user sees from when a keystroke is implemented to when that keystroke is echoed on the screen? There is a trade-off between thruput and delay. As we increase our thruput, it is likely that for some users, the delay will increase. For both thruput and delay, one can either look at peak thruput and peak delay or look at a five-minute average that would smooth some of those numbers out, or look at perhaps a daily load. Now, as applications change, the performance parameters that we look at change as well. For example, HTTP, the protocol that underlies the Web that is so extensive today, is a very bursty protocol. It is also a particularly ill-behaved protocol that does not sense network congestion before it puts out its request for information. With the rise, then, of the Web, we become more concerned with burst performance versus five-minute average or daily load. We are constantly reengineering our network. In doing so, it is unlikely that we are going to ever get good end-to-end performance. Rather, we seem to be pushing around the point of the bottleneck. Separate from the network is the performance issues associated with the server and the client. Is the client software well-tuned? Does the server have sufficient disk space and quick I/O access to make the requests for disk information readily available? Unfortunately for the end user, there is no easy way to differentiate bottlenecks in the network from bottlenecks that may be caused by server and client hardware issues. Lastly, we are often asked, "What is the availability of our network?" Fortunately, we can almost always say, "The network is down," because the network is so comprehensive that some element of it is down. Rather than merely saying, "The network is available 99.5% of the time," it is more meaningful to talk about what percentages of the nodes on the network are currently reachable from your desktop.

What approaches do we use at this layer? Well, again, we have to ask, "What do we measure?" And as I just indicated, reachability is a useful tool. Similarly, utilization, delay, thruput, are all meaningful metrics. Many of our tools involve the standard network analyzers that have been around for several years, something that you plug into the network to look at individual packets going by to diagnose trouble. Recently, a new set of tools have been added that allow us to logically probe the network without necessarily attaching network analyzers. Those tools include the Simple Network Management Protocol in two versions and the databases that SNMP runs on, which are called MIBs, or Management Information Bases. One particularly important MIB is RMON. That has opened up a new set of capabilities. Another set of tools that we utilize at the network layer are processes that allow users to add machines and to obtain addresses for those machines automatically. That has proven to be very valuable as the demand on our networking staff increases, to permit users to register their own hosts. Much as in the physical layer, there is a need for standards at the network layer. These standards will govern how users configure their desktop software to find name servers, to find paths for packets to travel. Lastly, the network layer is often the point where we can most effectively enforce the security of the environment and to build firewalls so that external network traffic is carefully regulated into its entry into our environment.

Let's look a little bit more at network analyzers. Network analyzers capture and interpret packets on the network. That could potentially be a tremendous volume of traffic, and so one useful feature of a network analyzer is a trigger mechanism that initiates the capture only when a certain phenomenon has occurred. Once we've triggered, we may not want to capture all subsequent data. We may want to capture only those packets that relate to the triggered event. Such kinds of selective captures are called slices. Network analyzers vary a great deal in how they interpret the packets that they have captured. Some will present you with hexadecimal. Some will present you with actual network operating system commands, such as saying, "This packet is initiating a request for a directory on the file server." The interpretation levels vary according to the cost of the network analyzer, and you can find network analyzers at $5,000 up to $15,000 or $20,000. And generally, what one buys in a more expensive network analyzer is better triggers and slices and higher levels of interpretation. One important feature of network analyzers to note is that typically network analyzers, given their expense, are dispatched after a network problem has occurred. Because of that, you need to have baselines established so that the network analyzer can compare the data it's recording in its abnormality to what the normal environment should look like.

The Simple Network Management Protocol, or SNMP, has been a very valuable addition to our tool set for network management. SNMP consists of a set of commands that are executed between agents. Those agents can reside on managed objects, such as routers and switches, those agents can reside in computers, and those agents can also reside in a host workstation that is acting as the focal point of the SNMP protocol. These agents fetch information from Management Information Databases located on the managed devices.

SNMP is a very extensible toolset. It's built on five simple commands. As you see, those commands, such as Get and Get-Next, are very rudimentary and require a wraparound of a network management system to make those functional for the network manager. The agents communicate using the SNMP protocol, and the MIBs contain the information that the agents manage. Typically, within a MIB, you'll find counters that are registering traffic and thresholds that trigger various alarms. Recently SNMP has been enlarged to include more sophisticated Management Information Bases, and in the second part of this seminar, we'll focus on a number of those. In order to have those simple commands used in a powerful fashion, there is a need for an overall network management system that can issue high-level commands which in turn are translated into the Get and Get-Nexts of life.

SNMP has a number of strengths. It is a simple protocol. That has allowed it to be ported to a wide variety of network devices, including soda machines. Secondly, SNMP is ubiquitous. You can find SNMP in almost all products being manufactured today. Thirdly, it is a very extensible set of protocols and has been enlarged over time to give us a much more powerful tool. At the same time, SNMP has a number of major weaknesses. There is no security within SNMP itself. That means that a clueful user can break a network by using the monitoring tools themselves. We'll see that that is remedied in a later version of SNMP. SNMP is not a particularly efficient protocol. In order to get a large amount of information from a managed node, I may have to issue Get, Get-Next, Get- Next, Get-Next commands repeatedly. Lastly, SNMP has no analytic tools itself. It has no mechanism for displaying in a graphical interface what's happening on the network. It has no tools for distilling the large amounts of data it generates into a cogent set of specific analyses that will help the diagnostic process.

RMON represents the next generation of network monitoring. RMON is a particular MIB that contains thresholds and counters at much higher levels of sophistication than is generally available in MIB I or MIB II. These RMON devices and databases can be embedded in network components such as routers and hubs, or are also available as stand-alone probes to be connected directly to the network. RMON permits historical graphing of information over some lengthy time period. RMON also has counters that enable more sophisticated statistical analyses. RMON is in some sense a replacement for those network analyzers in that they are able to capture Ethernet level traffic patterns and able to include triggers, slices, and capture buffers much as you would find inside a network analyzer. RMON is the first network MIB. Previously, MIBs were oriented to examine the kinds of traffic that were entering the device. A network MIB, on the other hand, sits out there and monitors all the traffic on the network, not just the network traffic intended for a particular device. Lastly,. RMON is the first configurable MIB. I'm allowed from my workstation that manages RMON to go in and set the frequency at which statistical data is gathered and the thresholds that trigger certain capture mechanisms.

RMON 2 is a relatively recent development that allows the RMON MIB to include parameters at higher levels of the protocol stack. For example, using RMON 2, I can ask the question, "How much of the traffic on my network is Novell versus IP?" Indeed, with RMON 2 I can actually go out and ask, "How much traffic is being generated by a particular application that an end user is using?" That sophisticated tool is very helpful as we start to engineer more complicated environments. Another feature of RMON 2 is its ability to watch the processes that go on in a network in translating machine addresses into next hop paths and determine whether or not the routing is being done in a cogent fashion. One other capability in this vein is to detect duplicate IP addresses on a network. It turns out that in practice, duplicate IP addresses is the most common failure of a network today. RMON 2 also provides a level of abstraction among the various vendors making network management tools, and so it gives us interoperable probe configuration, where sitting at my management work station, I can configure the various RMON devices out there without knowing a particular vendor's orientation towards configuration. Lastly, not only can RMON provide me traffic flows per protocol and per application, but it can say that the traffic from node X to node Y relative to a particular application on node X interacting with node Y is creating a certain percentage of my overall network traffic. This helps identify network abuse.

A Network Management System tries to integrate under a single umbrella all of the management functions. These include both routine activities, such as detecting of network faults, tracking problems, and gathering data, and sophisticated network management functions, such as continuously analyzing network performance and reporting key findings such as, "I think it's time to change from shared Ethernet to switched Ethernet." However, a Network Management System will not give you security tools or accounting tools. Those need to come from other applications.

A Network Management System will focus a great deal on collecting data from the various network devices. Some devices do not respond directly to SNMP and so have proxies created that will interpret the SNMP commands into the vernacular of those unusual network devices. A second component of a Network Management System is a graphical status display. This display allows an operator to easily observe what links may be down as well as to then make connections between the outages and the sets of services for remedy; for example, by double-clicking on a red line that indicates a down circuit, an operator should be able to pull up the locations of those circuits, the point of contact for those circuits, and track any trouble tickets that may have already been issued on that outage. A third element of a Network Management System is a relational database that stores and retrieves all of the data coming into the NMS. A fourth major element of a Network Management System are statistical tools that distill the large amount of data that comes in from SNMP into a refined set of analyses for the network manager to utilize. A well-crafted Network Management System should have a modular design that permits expansion and customization of the various elements. And lastly, a Network Management System should interact with the trouble ticket system so that outages and network failures can be easily entered into a trouble queue and then processed.

One of the key issues in managing the network layer is designing the network properly in the first place. Of all of the costs involved in networking -- hardware, software, circuits, and people -- it is only the people cost whose price is not coming down. That suggests designing the network in the first place for efficient management. Secondly, when a network fails, it's important to observe the mode of failure. In some cases, a failure is accompanied by dropped packets. In other cases, a router or a router card may seize and completely fail. It's important that our modes of failure be graceful and that, when we're trying to diagnose those modes of failure, we have baselines that help us identify what normal operations should look like. We are moving into an environment where we are attempting to connect LANs together using ATM switches and Ether switches. While this gives us a great deal of power and sophistication, at the same time it makes managing the environment and managing the switches that connect these virtual LANs a more challenging task. As alluded to earlier, we have to cope with those protocols which run on our networks but which we do not route through our routers. Tunneling and encapsulation strategies will be significant in order to realize effective use of the network. Lastly, we have to worry about security -- firewalls and protection -- to maintain our internal environment.

At the top level of the protocol stack is what we'll call a set of enterprise-wide services. One obvious need is to coordinate among all those copies of Novell that are out there, or all those copies of AppleShare. We need to keep the versions consistent. We need to have the addressing consistent so that LANs that were initially established in an isolated environment can be joined together as a virtual LAN without collisions in terms of addressing. We need to make sure that those LANs communicate across our campus fabric with cogent routing. Secondly, we often want to provide e-mail and Web service for the entire enterprise, and to do that in a uniform fashion. Thirdly, we often want to have printers located on the network be able to respond to multiple protocols and multiple print stations. For example, a printer should be able to print from both IP applications and Novell applications. Fourth, there's a set of services that are very useful to running a modern university. For example, list serving and list processors are very useful. It is also helpful to have a mechanism so that when people send attachments to e-mail messages, those attachments can be read and deciphered by a variety of machines. Directory services tell people and computers where resources are located. Directory services are a very volatile area, not only for the technologies involved, but for some of the legal considerations. Typically, you may want a directory service to identify certain fields of information for internal users of the network; but for people coming in from outside the network, you may want to restrict the information available from that directory. Lastly, it's very helpful to have a network ID, a single log-on that allows a user to connect to multiple servers and multiple processes all across the enterprise.

Out of these services comes a rich set of management needs. We need to make our network operating systems on our LANs compatible. We need to have standard stacks of middleware, of identity servers and security servers and protocol emulation tools that are consistent from machine to machine to create a sustainable environment. As we indicated, we have moved from an environment with the user having a single external enterprise-wide account to needing accounts for multiple machines and multiple services. Users, however, want a single password that can access those multiple hosts. We need to adopt a comprehensive approach to not only security, but to the other side of the security coin, privacy issues. We may be in an environment where there are multiple protocols being spoken by multiple machines that we want to connect to. We have to distill those multiple interfaces down into a single interface for the user to utilize. Conservation of interface is becoming a key issue with the increase in diverse applications out there. While we all instruct our users to do backups on their local desktops, few of us actually do so. One remedy to that Achilles heel in the modern environment is to create an institutional file system that embraces both the desktop and central servers and that facilitates the backups of desktops. We need to operate our directory services in a way that provides information to those who need to know but limit the general information available.

How do we approach these problems? Well, increasingly universities are turning towards global coordination of their network operating systems, mandating that LAN systems such as Novell be at a certain version level or higher. Universities are turning to gateways that are operated on the enterprise level to convert from, for example, the SMTP mail protocol to the X.400 protocol. One of the best solutions to network abuse is to educate the user on how to use the tools on the desktop. A NIC, or Network Information Center, can provide a major assistance to the users in their operations of their computers. In order to breed convergence of desktop software, it is often very valuable to create licensing processes, both manual and automatic, so that users are encouraged to use standard pieces of software. Lastly, at the enterprise level, it is often useful to operate a single Internet gateway. That Internet gateway may want to protect internal resources from access by external users. Typically we use proxy servers at that point to allow the external user to connect to the official Internet gateway and then have that Internet gateway in turn connect to hidden servers behind it on the campus.

One of the issues that we deal with at the enterprise-wide level is where to locate key servers and services. Where do we put the Domain Name Service within our enterprise environment? How do we handle routing? Where is the Web server both physically located and where is it logically located in terms of end-user inquiries? A second set of services to be provided on the enterprise-wide level include global accounts and logicals. Global accounts allow me to log into a variety of resources from a single authorization. Logicals allow us to address resources in English terms. For example, I may want to spool some printing to the library printer without designating it via a long sequence of machine addresses. Group classes allow me to access a variety of information depending upon what permissions I have; so typically, if I am a student that needs to access a variety of databases according to the classes I'm currently enrolled in, I'd like to have those privileges accrue to me as a result of who I am. Another issue in engineering enterprise-wide services is where to locate executable programs. Placing loadable modules on the desktop, on the server, or perhaps splitting them among the server and the desktop is a key design feature that insures adequate performance of the application and the network. Let's look at these client-server architecture issues a bit more. The network can have a tremendous impact on a client-server process, and a client-server process can truly demolish a network if it's not constructed properly. There is a variety of ways to build client-server applications. One can put presentation only on the client. One can control presentation between the client and the server. One can put the execution on the client; one can put execution on the server, or perhaps distribute those executions between those two machines. One back-end issue for client-server architecture that has significant impact performance on networks includes whether you have developed a two-tier model for client-server or a three-tier model. In a two-tier model, the server contains the application and the database that the application is working on. In a three-tier model, the database is separate from the application. Since the application and the database exchange a great deal of traffic typically, if one goes to the three- tier model, one needs to engineer sufficient network performance between the application and the datasets. One other design issue in the client- server architecture is whether one builds fat clients or thin clients. A thin client will have just a modest amount of executable code on the desktop. One may do that because of storage limitations on the client, or for the need to keep the logic that might be volatile within the client in a centrally stored environment. Fat clients tend to have their logic as well as their presentation layers stored on the client. While that reduces network impact, on the other hand changes in the logic need to be then distributed to all of the client sites.

Cutting across all three layers -- the physical layer, the network layer, and the enterprise layer -- there is a need for coordinated organization; an organization that does adequate capacity planning, that manages the operations of the network. When one looks at the modern network with its multiple layers, it is likely that those layers will be addressed by different staff within network operations. Training our employees and retaining them when they are trained is an ongoing issue that we all need to face. We've talked already about managing the expectations of the users. Managing the information in the directories, making sure that they're correct, making sure that they're only visible to appropriate eyes, is another major component that has to be addressed in network management. While most of us have gravitated towards the technologies of networks, network management today also includes a series of policies that need to be in place for networks to be perceived as well-behaved. We need rules on Email access, rules on what kind of content can be posted on a Web page. Under what conditions will we look at a person's private Email? These policy developments will likely take place at other points within the institution, but the policy development needs to be informed by the technological options that are available. That same degree of technological information applies to assessing the cost and risk of security. There are lots of threats out there. Remedying some of those threats will be either very expensive or reduce the capabilities that the average user has at their disposal. Somebody needs to look with care and precision at what is the risk and what is the cost of eliminating that risk.

Typically, operations includes support and training services for both the network managers and the end users, operating a 7x24 network operations center, and developing standards for network operating systems and for desktop software and hardware, and then promulgating those standards widely.

We face a number of strong challenges in this area. As in the past several years, we need to scale to the load presented. Secondly, many universities are reexamining how to develop help desks. Some have moved a significant amount of decentralization of help desks, while others are saying, "To get the economy of scale, we need to centralize our help operations." We have our technology layered and we have units that support those technologies layered horizontally, but the user wants to see a vertical integration of that technology. Many of us now are trying to serve users who are on the road or who are mobile. To do that requires modifying firewalls, addressing security issues, addressing performance issues that we never had to consider before. Lastly, we need to create a change in our culture inside network management. We need to become proactive versus reactive. We need to do diagnostics that indicate that load is increasing before performance deteriorates. We need to discern outages before users discern outages. We need to become proactive in recommending to users what they install on their desktops.

How do we go about this? One very useful approach is to develop intermediate support staff, staff that sit in departments on the LANs, doing a multitude of services for those departments. Those LAN managers provide a first point of contact for end users and a point of contact for the central organization and a buffer between the two. In many instances, end users may not have LAN managers available to them. It is then imperative to provide end users with as much information as possible to allow them to be part of the diagnostic process versus just the source of complaints. Lastly, we need to take sophisticated technical people and infuse in them a service culture. As our services proliferate, we need to package those services into meaningful units for end users to consume. Putting out a listing of the services we offer is putting out an invitation to confusion. We need to help users navigate through our offerings and find the right tools for the right purposes.

[Top of Page]