OPERATING SYSTEM

OPERATING SYSTEM

An operating system, or OS, is a software program that enables the computer hardware to communicate and operate with the computer software. Without a computer operating system, a computer would be useless.

As computers have progressed and developed so have the operating systems. Below is a basic list of the different operating systems and a few examples of operating systems that fall into each of the categories. Many computer operating systems will fall into more than one of the below categories.

GUI - Short for Graphical User Interface, a GUI Operating System contains graphics and icons and is commonly navigated by using a computer mouse. Below are some examples of GUI Operating Systems.

System 7.x

Windows 98

Windows CE

Multi-user - A multi-user operating system allows for multiple users to use the same computer at the same time and different times. Below are some examples of multi-user operating systems.

Linux
Unix
Windows 2000

Multiprocessing - An operating system capable of supporting and utilizing more than one computer processor. Below are some examples of multiprocessing operating systems.

Linux
Unix
Windows 2000

Multitasking - An operating system that is capable of allowing multiple software processes to run at the same time. Below are some examples of multitasking operating systems.

Unix
Windows 2000

Multithreading - Operating systems that allow different parts of a software program to run concurrently. Operating systems that would fall into this category are:

Linux
Unix
Windows 2000

UNIX

Unix (officially trademarked as UNIX, sometimes also written as Uɴɪx in small caps) is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, Michael Lesk and Joe Ossanna. The Unix operating system was first developed in assembly language, but by 1973 had been almost entirely recoded in C, greatly facilitating its further development and porting to other hardware. Today's Unix system evolution is split into various branches, developed over time by AT&T as well as various commercial vendors, universities (such as University of California, Berkeley's BSD), and non-profit organizations.

The Open Group, an industry standards consortium, owns the UNIX trademark. Only systems fully compliant with and certified according to the Single UNIX Specification are qualified to use the trademark; others might be called Unix system-like or Unix-like, although the Open Group disapprovesof this term. However, the term Unix is often used informally to denote any operating system that closely resembles the trademarked system.

During the late 1970s and early 1980s, the influence of Unix in academic circles led to large-scale adoption of Unix (particularly of the BSD variant, originating from the University of California, Berkeley) by commercial startups, the most notable of which are Solaris, HP-UX, Sequent, and AIX, as well as Darwin, which forms the core set of components upon which Apple's OS X, Apple TV, and iOS are based. Today, in addition to certified Unix systems such as those already mentioned, Unix-like operating systems such as MINIX, Linux, and BSD descendants (FreeBSD, NetBSD, OpenBSD, and DragonFly BSD) are commonly encountered. The term traditional Unix may be used to describe an operating system that has the characteristics of either Version 7 Unix or UNIX System V.

LINUX (OR GNU/LINUX)

Linux (or GNU/Linux) is a Unix-like operating system that was developed without any actual Unix code, unlike BSD and its variants. Linux can be used on a wide range of devices from supercomputers to wristwatches. The Linux kernel is released under an open source license, so anyone can read and modify its code. It has been modified to run on a large variety of electronics. Although estimates suggest that Linux is used on 1.82% of all personal computers, it has been widely adopted for use in servers and embedded systems (such as cell phones). Linux has superseded Unix in most places, and is used on the 10 most powerful supercomputers in the world. The Linux kernel is used in some popular distributions, such as Red Hat, Debian, Ubuntu, Linux Mint and Google's Android.

The GNU project is a mass collaboration of programmers who seek to create a completely free and open operating system that was similar to Unix but with completely original code. It was started in 1983 by Richard Stallman, and is responsible for many of the parts of most Linux variants. Thousands of pieces of software for virtually every operating system are licensed under the GNU General Public License. Meanwhile, the Linux kernel began as a side project of Linus Torvalds, a university student from Finland. In 1991, Torvalds began work on it, and posted information about his project on a newsgroup for computer students and programmers. He received a wave of support and volunteers who ended up creating a full-fledged kernel. Programmers from GNU took notice, and members of both projects worked to integrate the finished GNU parts with the Linux kernel in order to create a full-fledged operating system.

MICROSOFT WINDOWS

Microsoft Windows is a family of proprietary operating systems designed by Microsoft Corporation and primarily targeted to Intel architecture based computers, with an estimated 88.9 percent total usage share on Web connected computers. The newest version is Windows 8 for workstations and Windows Server 2012 for servers. Windows 7 recently overtook Windows XP as most used OS.

Microsoft Windows originated in 1985 as an operating environment running on top of MS-DOS, which was the standard operating system shipped on most Intel architecture personal computers at the time. In 1995, Windows 95 was released which only used MS-DOS as a bootstrap. For backwards compatibility, Win9x could run real-mode MS-DOS and 16 bits Windows 3.x drivers. Windows Me, released in 2000, was the last version in the Win9x family. Later versions have all been based on the Windows NT kernel. Current versions of Windows run on IA-32 and x86-64 microprocessors, although Windows 8 will support ARM architecture. In the past, Windows NT supported non-Intel architectures.

Server editions of Windows are widely used. In recent years, Microsoft has expended significant capital in an effort to promote the use of Windows as a server operating system. However, Windows' usage on servers is not as widespread as on personal computers, as Windows competes against Linux and BSD for server market share.

Microsoft Windows is a series of graphical interface operating systems developed, marketed, and sold by Microsoft. Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces (GUIs). Microsoft Windows came to dominate the world's personal computer market with over 90% market share, overtaking Mac OS, which had been introduced in 1984. The most recent client version of Windows is Windows 8; the most recent mobile client version is Windows Phone 8; the most recent server version is Windows Server 2012.

ADVANTAGES OF SECURITY INSTALLATION

Network security is a major challenge for network operators and internet service providers in order to prevent it from the attack of intruders. There are many security tools available for this purpose however security softwares are the most convenient and easy way to deal with the problem of network security.

Important Considerations before Choosing Software

It is difficult to stop hackers and crackers from entering into your system if you do not know which way they are entering into your PC. Choose software which is capable of scanning the problem first and then provide the cure. Choose software which is easy to install and knows how to fix the problems. Software which is also able to provide file based scanning should be preferred. Choose software which is helpful in supporting active directory query system. Chosen software should be able to check and examine ports, shares, files and users. Software should be scalable in order to provide assistance on any operating system. Security software should be competitive enough to locate several vulnerabilities in the database without additional composite additional cost. In the end always choose software which comes with the more upgraded versions.

Advantages of Security Softwares

Whenever you deploy a network you always install security software to every single computer in the network. . However we still need to provide unbreakable network security to our network as a whole. Providing network security to entire network has several advantages. First of all the network security software which is installed on the network provides centralized defense against the security threats. The changes in the central network would run changes automatically to the entire network. The single desktop security is more prone to the attack of worms and viruses. Hence installing security software on a server has two benefits. First of all the network administrators does not have to install and update the security software on each individual user. Secondly he can prevent the entire network form the unauthorized users. Another advantage of security softwares is that you have an option to install firewall to each crucial port. Installing anti virus softwares on each individual computer of the network would slow down the machines and the network as a whole. Centralized security mechanism would help eliminate this problem. It is time consuming to install and upgrade the anti virus softwares on individual computers .hence using centralized security system would save a lot of time for network administrator.

Types of Network Security Softwares

Most of the new computers, laptops and palm tops are equipped with the built in anti virus mechanism. However still there is need to upgrade these softwares and scan your PC for security threats. Generally speaking there are many types of security systems like intrusion detection systems, antivirus softwares, firewalls, file based systems and heuristic systems. Intrusion prevention system is a set of activities which the software performs in order to filter the system form the presence of worms and viruses. Intrusion prevention system improves network protection, prevent sudden attacks and enhance the virus fortification system. Cisco’s intrusion protection system is however considered a practical approach for business networks. However certain names which are available in the market and which offer viable protection against the security threats are many. Famous anti virus softwares include AOL active virus shield, Avast, AVG, Avira, AVZ, BitDefender, Bullgauard, Clam antivirus, F secure, Norman, Microsoft security essentials and Symantec Norton. Beside expensive softwares there are many free and open source softwares which can be used by downloading it into the network server. Moreover almost all vendors often software trials before you purchase one. Trial can help you to choose most suitable security software for your network.

USE OF INTERNET

Since the internet has become popular, it is being used for many purposes. Through the help of the World Wide Web and websites, the internet has become very useful in many ways for the common man. Today internet has brought a globe in a single room. Right from news across the corner of the world, wealth of knowledge to shopping, purchasing the tickets of your favorite movie-everything is at your finger tips.

Here is the list of some common uses of internet

1) Email: By using internet now we can communicate in a fraction of seconds with a person who is sitting in the other part of the world. Today for better communication, we can avail the facilities of e-mail. We can chat for hours with our loved ones. There are plenty messenger services and email services offering this service for free. With help of such services, it has become very easy to establish a kind of global friendship where you can share your thoughts, can explore other cultures of different ethnicity.

2) Information: The biggest advantage that internet offering is information. The internet and the World Wide Web has made it easy for anyone to access information, and it can be of any type, as the internet is flooded with information. The internet and the World Wide Web has made it easy for anyone to access information, and it can be of any type. Any kind of information on any topic is available on the Internet.

3) Business: World trade has seen a big boom with the help of the internet, as it has become easier for buyers and sellers to communicate and also to advertise their sites. Now a dayandrsquo;s most of the people are using online classified sites to buy or sell or advertising their products or services. Classified sites saves you lot of money and time so this is chosen as medium by most of people to advertise their products. We have many classified sites on the web like craigslist, Adsglobe.com, Kijiji etc.

4) Social Networking: Today social networking sites have become an important part of the online community. Almost all users are members use it for personal and business purposes. It is an awesome place to network with many entrepreneurs who come here to begin building their own personal and business brand.

5) Shopping: In todayandrsquo;s busy life most of us are interested to shop online. Now a dayandrsquo;s almost anything can be bought with the use of the internet. In countries like USA most of consumers prefer to shop from home. We have many shopping sites on internet like amazon.com, Dealsglobe.com etc. People also use the internet to auction goods. There are many auction sites online, where anything can be sold.

6) Entertainment: On internet we can find all forms of entertainment from watching films to playing games online. Almost anyone can find the right kind of entertainment for themselves. When people surf the Web, there are numerous things that can be found. Music, hobbies, news and more can be found and shared on the Internet. There are numerous games that may be downloaded from the Internet for free.

7) E-Commerce: Ecommerce is the concept used for any type of commercial maneuvering, or business deals that involves the transfer of information across the globe via internet. It has become a phenomenon associated with any kind of shopping, almost anything. It has got a real amazing and range of products from household needs, technology to entertainment.

8) Services: Many services are now provided on the internet such as online banking, job seeking, purchasing tickets for your favorite movies, and guidance services on array of topics in the every aspect of life, and hotel reservations and bills paying. Often these services are not available off-line and can cost you more.

9) Job Search: Internet makes life easy for both employers and job seekers as there are plenty of job sites which connects employers and job seekers.

10) Dating/Personals: People are connecting with others though internet and finding their life partners. Internet not only helps to find the right person but also to continue the relationship.

GRAPHIC VISUALIZATIONS TECHNIQUES

When you want to share information beyond just words and phrases, graphic visualizing techniques can create images designed to quickly clarify and communicate. Word mapping is a common visualization technique. If you've seen the movie "Avatar," you've witnessed three-dimensional graphic visualizations. The most advanced techniques now offer practical application of science fiction-like methods.

1. Mind Mapping

Graphic visualization techniques are computer-generated displays of data in the form of various types of images. Mind mapping is the term used by Smashing Magazine (and others) to describe an increasingly popular form of visualized data. Mind maps summarize basic information, consolidate information from various separate sources and present "information in a format that shows the overall structure of your subject as a type of affinity diagram," according to MindTools.com. An effective mind map shows the overall "big picture" view of a subject and, at the same time, offers the tiny details that make up the larger view.

Word Mapping

Another popular use of graphic visualization is word mapping. For example, when the President of the United States delivers a State of the Union message, you can program a computer to analyze which words are used most often. During a recent speech, the words "America" and "future" were among the most used. A word map of all the words employed would picture those two words in the largest boxes with smaller and smaller boxes for each less used word. You can quickly get a picture of the basic themes of the speech. That's a two-dimensional graphic visualization technique applied practically to yield basic information about an important speech.

3-D Graphics

Many modern graphic visualizations are presented in three-dimensional versions. The movie-going experience is probably the most common example of witnessing the development of computer generated visual effects. But it's not just for entertainment. Such advanced, sophisticated presentations of information are valuable in the scientific community, especially in analyzing patterns and sets of data in medicine, engineering, applied mathematics and similar fields. Because of the demand for experts in this field, colleges and universities across the nation now offer degrees in computer graphics visualization techniques.

Immersive Virtual Reality

The most intriguing example of a graphic visualization technique is something called "immersive virtual reality." A head-mounted computer screen allows the user to move about in a virtual space while sensors track movement and while the user holds a device allowing interaction with "objects" in the virtual reality of the 360-degree simulation. It this sounds like the "holodeck" from the Star Trek series, indeed, that's the direction of this most sophisticated graphic visualization tool. And it has real-world uses: a computer program along these lines is now being developed to teach nurses the practical application of cardiopulminary resuscitation.

HARDWARE AND SOFTWARE

Computer hardware is any physical device, something that you are able to touch and software is a collection of instructions and code installed into the computer and cannot be touched. For example, the computer monitor you are using to read this text on and the mouse you are using to navigate this web page is computer hardware. The Internet browser that allowed you to visit this page and the operating system that the browser is running on is software.

COMPUTER NETWORKING

Computer networking must sound very familiar term as it is now very much adopted by every one. It started ages ago; the original concept was to connect two separate computers with each other for communications. This concept was commercially adopted by most of computer users, Institutions, schools, IT departments with in the decay and networking of computers got very popular compulsion.

On the broader scale computers networking is collection of multiple computers, printers, scanners and other devices to communicate and share information with each other. Sharing files, software etc can not be possible without the concept of networking; networks have made lives easier for many people in their professions. One can command the computer to print the document out without even sitting on it.

Computers are every where now it is the must have machine in every field of profession. If you have more than one computer at your place having it networked is as important and useful as having computer it self. In this section of tutorials and articles we will teach you what computer networks are, what are computer network types? Which network type is best used for which approach, hardware requirement, advantages, software requirement, computer network security?. There are as many as eight types with which comptuer networks can be formed, most commonly used is LAN, Local Area Networks , others are WAN , Wide area network, MAN, Metropolitan Area Network. Read about more Comptuer Network types.

Local Area Network (LAN) is technical name for computer networks which is normally developed among a single house,office etc. Immense work is done on networking by networking device manufacturers and it is still going on, every now an then there is new and better solution to network multiple computers. LAN is been implemented and been replaced by WLAN already, WLAN is wireless local area network which performs same function as LAN does but wirelessly. LAN however is still used and understanding it is very important before moving to better and enhance solutions. Browse through the navigation towards your left which will help you understand the entire concept involving deployment of computer networks and that should be able to help you build your very own local area network.

Wide Area Network, (WAN) Wide area network is communication amoung computers which are located far from each other. Internet is one classic example of WAN, It is the collection of large number of computers connecting togather to share information with each other and accesible from every where.

Metropolitan Area Network (MAN) Man is not used as commonly as WAN and LAN networks are, it posses its importance when it comes to connect two offices or organization remotly located togather to build networks among computing systems. It covers large area but not as much as WAN is capable of.

NETWORK TYPES

Networks are all about sharing resources of computers, servers, printers, scanners etc to each other. There are two different types with which network between computers can be formed. Networking formation completely depends on the requirement of the organization scale and usability. We should always study requirements and needs before we decide to choose any type of network. Picking wrong set of options can cost you waste of money, time and resources. Two types of networks are Peer to Peer networking also known to be p2p, the other one is Client and Server networks.

Peer to peer (p2p)

P2p networking type is most commonly used computer networks. This type of network is very cost effective but supports lesser number of computers in network. Ten to fifteen computers can be connected to each other using p2p networking model without problem, more number of computers often create problems. All computers possesses same status within the network and no computer control any other computer but it self, this network does not have server to control and monitor. Security level is not towards higher side and each work station it self is responsible for security. Using p2p models files can be shared among computers. Files like, videos, audios, pictures, spreadsheets and all digital media can be sent or received with in the network. Printers, scanners and internet can be shared with in all computers. Below is the picture showing three computers connected to each other with hub and switch. All computers are connected to hub through Network adaptor card using Cable and hub or switch is connected to internet to pass it on to connected computers.. You can see there is no server involved in this diagram but all individual computers are connecting to hub forming P2P network.

Limitation of P2P networking model:

Before deciding to implement P2P model one must know the limitations of this type. Getting to know later can be frustrating big time. It would highly be recommended to get your organizational people site together and discuss the needs. Peer to Peer looks very simple, quite cost effective and attractive, yet it can keep progress very limited.

Peer-To-Peer networks are designed for limited number computers, it will start creating issues when exceed 15 number of computers

High security levels can not be achieved using p2p networks, so if organization have concerns with security p2p will not be that great.
Organizational growth will outgrow p2p networks; it will not support growing number of computers when increased above fifteen.

Regular training is required for computer users of p2p network. p2p network is control by computers and computers are controlled by human, small mistake by one of the user can hold the work for other users on same p2p network.

Client Server Network Model :

Choosing right kind of networking model is very important for organization. If you are using lesser number of computer and do not see any need to increase the numbers of computers to more than 15 then you are fine with peer to peer networking model, but if you are bigger organization or seeing growth in network, client and server model is designed for it.

The difference in p2p and client server model is that p2p does not have any device or computer that controls computers on network whereas; client / server model has one dedicated computer which is called server. It is called dedicated server. All computers are connected to hub and hub is connected to dedicated server. Server is responsible to perform according to the request sent to it by clients. For example server can act as print server, if client request a print of document server will send print command to printer and it will be printed. Same way all the files are stored on the server and not on client computer, same client can retrieve data by using any other computer on the same network. This concept is known as centralization, this enables server to keep profile of users, data, and software etc completely in tacked and organized.

Normal computer can also be configured as server and it should be alright and perform server tasks efficiently, but if network growth is on seen and many computers are required to attach to network that’s where we might need proper server to take over the network.

You can see in diagram below. All the workstations (Clients) are attached on server, some times there is hub involved but in this case it is just clients and server.

Features of Server:

Servers are powerful machines when they are compared to normal desktop computers. They are meant to provide strength to computing power within the entire network. Controlling developed network can only be done by dedicated servers as they have higher specifications to support network. Servers can have better processing speed with multiple p\rocessors capability available. Server machine have higher RAM to load and execute software with ease. They have more advance network cards installed for faster data transfer. Hard drives are way bigger to store the data for entire clients. Hardware can be plugged in and plugged out while server is on, this helps network stable, and hardware like hard disk can be removed and attached accordingly.

Server Os:

Operating systems are also specially designed for servers. Server Os have much more features file serving, print serving, backing up data, enhanced security features etc. There are few major Server Os which are used commonly in servers, Windows server NT. 2000 , 2003,Linux and Novell NetWare. Windows server 2003 is more powerful and enhanced for much higher security levels, Linux servers provide the maximum security to networks.

Advantages of Computer Network

Computer networks have highly benefited various fields of educational sectors, business world and many organizations. They can be seen every where they connect people allover the world. There are some major advantages which compute networks has provided making the human life more relaxed and easy. Some of them are listed below

Communication
Communication is one of the biggest advantages provided by the computer networks. Different computer networking technology has improved the way of communications people from the same or different organization can communicate in the matter of minutes for collaborating the work activities. In offices and organizations computer networks are serving as the backbone of the daily communication from top to bottom level of organization. Different types of softwares can be installed which are useful for transmitting messages and emails at fast speed.

Data sharing

Another wonderful advantage of computer networks is the data sharing. All the data such as documents, file, accounts information, reports multi media etc can be shared with the help computer networks. Hardware sharing and application sharing is also allowed in many organizations such as banks and small firms. .

Instant and multiple accesses

Computer networks are multiply processed .many of users can access the same information at the same time. Immediate commands such as printing commands can be made with the help of computer networks.

Video conferencing

Before the arrival of the computer networks there was no concept for the video conferencing. LAN and WN have made it possible for the organizations and business sectors to call the live video conferencing for important discussions and meetings.

Internet Service

Computer networks provide internet service over the entire network. Every single computer attached to the network can experience the high speed internet. Fast processing and work load distribution.

Broad casting

With the help of computer networks news and important messages can be broadcasted just in the matter of seconds who saves a lot of time and effort of the work. People, can exchange messages immediately over the network any time or we can say 24 hour.

Photographs and large files

Computer network can also be used for sending large data file such as high resolution photographs over the computer network to more then when users at a time.

Saves Cost

Computer networks save a lot of cost for any organizations in different ways. Building up links thorough the computer networks immediately transfers files and messages to the other people which reduced transportation and communication expense. It also raises the standard of the organization because of the advanced technologies that re used in networking.

Remote access and login

Employees of different or same organization connected by the networks can access the networks by simply entering the network remote IP or web remote IP. In this the communication gap which was present before the computer networks no more exist.

Flexible

Computer networks are quite flexible all of its topologies and networking strategies supports addition for extra components and terminals to the network. They are equally fit for large as well as small organizations.

Reliable

Computer networks are reliable when safety of the data is concerned. If one of the attached system collapse same data can be gathered form another system attached to the same network.

Data transmission

Data is transferred at the fast speed even in the scenarios when one or two terminals machine fails to work properly. Data transmission in seldom affected in the computer networks. Almost complete communication can be achieved in critical scenarios too.

Provides broader view

For a common man computer networks are an n idea to share their individual views to the other world.

INTRODUCTION TO COMPUTER NETWORK DEVICES

Learning about network types and configuration remains incomplete unless we get to know the devices which help in communication between computers in any given network. Without the communication devices networks cannot be formed so knowing their names and what are their uses are equally important. To develop LAN network following network communication devices are required which are listed below:

• Nic Adapters
• Routers
• Hubs
• Switches
• Gateways
• Modems

• Networking Cables

NIC Adapters:

NIC is Network Interface Card; this is the most important device in building network. These adapters are the most common part of computers which are used in our homes and offices. Nic is also referred to LAN, i.e. is Local area network card. Communication mediums (cables) are attached to this card to build network. This device has unique Mac address. To build network unique IP address is assign to this LAN card to begun communication. In case of developing WLAN, instead of LAN card we use Wireless card. Its functionality is same as simple LAN card; it is just wireless communication device which connects to router for communication.

Router is intelligent device which routes data to destination computers. It helps in connecting two different logical and physical networks together. In small network server is connected to router along with clients for communication. With routers network communication is not possible; it is soul of network without which distribution if internet and other network data to entire network is impossible. It works very same when it comes to use wireless network using wireless network router. It performs all functions similarly without using any medium like cables etc.Router uses software known as routing table. Routing table is used to store source and destination address. Major companies which know for manufacturing routers and wireless routers are Tp Link, Cisco systems, Nortel, D link etc.

Hubs:

If we talk about networks on larger scale hub(s) are required to build network. All computers are connected directly to the hub as hub performs as centralized device the network. When data is sent to the hub it broadcasts the data to all the ports of the hub and then it is sent to destination computer on the network. If hubs fails to perform its routine functions it will halt the working of the entire network until it is put back in normal condition.

Switches:

Switch is another important device when we talk about computer network on broader spectrum.It is used at the same place as hub is but the only difference between the two is that switch possess switching table with in it. Switching tables store the MAC addresses of every computer it is connected to and send the data to only requested address unlike hub which broadcasts the data too all the ports. Switches can be considered advance form of hubs.

Gateways:

As name suggests it some kind of passing through to some thing. Interestingly gateways can be software or it can also be device. Gateway device connects LAN with internet. Its basic functionality is to provide security to the network. By using gateways incoming/out going traffic can be monitored for any malicious activity within the network which can be harmful to network integrity.

Modems:
Modems can be of two types. One modem is very common in every computer which we use to connect to internet using our telephone line by dialing to our ISP and the other one is used to connect to DSL. Functions however are same for both types of modems; they are used for modulation and demodulation, they are used to convert analog signals into digital and digital signals into analog so that signals can be travelled on telephone lines.

Cables:

Cables are obviously used to connect communication devices with each other to form network. There different types of cables, commonly used cables are 10baseT/CAT5 , coaxial cable, Ethernet and fiber optical cable. Fiber optical is the most expensive as it enables the data transfer at speed of light. It is costly solution which is mostly get adopted by corporate sector. However in recent developments optical fiber cable is now being used in home networking and also used as medium to connect to internet.

ETHERNET

We have covered all important topics in computer network section regarding Computer networking, writing about Ethernet with context to all the topics in this section will help enhance and clear the understanding to even better levels. In this particular section, we will explain terminologies of LAN, overview of basic Ethernet networking, introduction to IEEE, Wan Standards etc. You can review other important related topics of computer networks by going to main computer network section.

If you know little bit of Computer networks, I am sure you will be familiar with IEEE standards, this is Institute of Electrical and Electronic Engineers developed by International Standard Organization (ISO) in 1985. IEEE developed LAN standards, which is commonly known as IEEE 802 standards. These are set of standards which are widely adopted by IT sector through out the world. Most commonly known and popular standards of IEEE are IEEE 802.3 which is known as Ethernet. Ethernet is LAN technology used world wide on very large scale, almost every office, school and even homes have deployed Ethernet technology. Xerox corporation developed LAN in 1972 which was world first LAN as well. All the hardware manufactures uses set of Standards by ISO to produce products which can operate any where in the world.

The Ethernet in its simplest form uses a passive bus that operates at 10 Mbps. The bus is formed from the co-axial cable, which connects all the PCs in the LAN. LAN is widely used computer network type world wide, it can practically hold up 1024 computers attached to a single LAN. Although we do not see this many computers connected to LAN, far lesser numbers of computers are used in LAN depending on the size of the organization and computer requirements. Time has changed dramatically now, It is all about time and reliability now, time is the most vital factor when it comes to send, receive and share information securely. Distance has increased from just a building to thousands of miles, data is sent, received and shared among this distance in seconds now, same data can be shared from miles simultaneously, nature of the data can be voice, video etc, which moves from remote location in fraction of seconds. This all happens because of Internet, which allows businesses from remote location to share information with their customers to other part of the world.

In today’s IT world the reliable and timely access to the information has become vital. Today coworkers thousands of miles apart from each other can share data, voice, video etc with the fractions of seconds. Similarly a larger number of the coworkers can review the research data simultaneously. The Internet allows businesses to share information and resources with their customers. Ethernet is communication protocol for all majority networks; this protocol is embedded in hardware devices and software. Ethernet has become the standard protocol of computer networking with help of companies like Intel, Xerox and Digital.

Components of LAN

LAN consists of the following components.

a) Number of Computers, 2 or more
b) NIC, network interface cards in every computer
c) Ethernet Cable Cat5, STP/UTP
d) Router, and Hub or switch
e) Computer network software.

LAN card which is also called Network interface card is installed in every individual computer which is to be included in network, each Nic is assigned unique address for unique identification. Two computers are connected with communication medium Ethernet Cable, cable has RJ45 connectors at the both ends of the cable. Either two computers are directly attached to each other or each computer is directly connected with hub/switch which enable networked computers to communicate with each other. Switch and hubs works as relay.

In same way the entire computer network can be setup wirelessly, instead of using cable, radio signals are used for communication, Wireless LAN cards are used instead of simple LAN cards. A small antenna is used in wireless network cards which send and receive radio signals from other computers or wireless hubs, wireless switches and wireless routers. Wireless LAN is very easy to set up, it keeps the environment rather clean and organized, however more configuration is required to keep up the wireless network perform flawlessly. Wireless LAN is also know as Wi-Fi, We have covered Wi-Fi topic in details.

Wireless LAN is alternative technology to Ethernet; Token ring topology is another alternative technology to Ethernet. IBM and ATM have designed Token Ring. Using ATM technology, devices are connected with each other over very large distance; which become possible due to Wireless LAN. Ethernet is established network standards and supports both small and medium sized networks. It is in for past 3 decades and have formed excellent communication environment. Ethernet follows some rules and before we go to rules some terminologies are important to enough to be read about.

a) Node – The devices that are attached to the segments are nodes.
b) Frame – Frames are short messages which are sent by nodes (computer and network devices). Frames are basically chunk of information which size varies.
c) Segment – A single shared medium is known as Ethernet segment.
d) Medium – The Ethernet devices are attached to a common communication medium. The frames of the data travel along with that medium. This medium can be coaxial cable. Most commonly used communication mediums are UTP/STP cable, fiber optic cables.

Ethernet Set of Rules

Ethernet has specific set of rules for generating frames.
• Length of frame Varies
• Frame must contain source of message and destination address for identification
• Nodes should be uniquely identifiable
• Unique Address for each Ethernet device

TCP/IP

The protocol that allows the sharing of resource among cooperate computers across a network is known as TCP/IP. The protocol was developed by a bunch of researchers settled around the ARPAnet. It is beyond doubts that the ARPAnet is until now the best TCP/IP network. Interestingly, more than 130 vendors has equipments to support TCP/IP till June, 1987 and the protocol was utilized by thousands of networks.

Internet Protocol Suite is the most accurate name for those protocols which are discussed here. TCP/IP are two protocols that belong to this suite. They are the most frequently used protocols and now it is became a ritual to combine the two name, the TCP and the IP, in order to refer to the family.

Internet is a term that refers to the entire collection of networks. It has regional networks like NYsernet, Arpanet, local networks at research centers and educational institutions and indeed military networks. On behalf of Department of Defense, DDN (Defense Data Network), manages the subnets of them. All networks are inter-connected. If there are no policy or security restrictions on accessing a network hen the data can be shared between the users of all networks. The standards that an internet community adapts for its personal use are internet protocol documents.

The family of protocols is TCP/IP. Few of them provide functions of low level that are required in many applications, including UDP, IP and TCP. Other protocols are dedicated to perform a restricted tasks such as sending emails, finding who is logged in on another system or transferring files between the computers. Only minicomputers or mainframes were the initial users of TCP/IP. The machines were self-governed and have their own disks. Here are some of the most conventional services performed by the TCP/IP.

File Transfer: The FTP (file transfer protocol) is a protocol that allows the user of one computer to send files to another computer. To ensure the security of the FTP data, a user name and a password is prescribed. It is a utility that can be used to access a file that is placed on another system any time. The protocol can be run to copy files to one’s computer so the person can work on personal copy.
Remote Login: The TELNET (network terminal protocol) empowers a user to log in from any other computer system that is available on a network. It is started by a remote session in which a computer is specified to connect with. Anything the user type on one computer is sent to another until the session is finished. The talent program is developed to make the running computer invisible. Whatever is typed is sent to other system without any delay. It mostly functions like a dial up connection. The remote system will be authorized by the use of a user name and a password that can be assigned by the creator of the dial up. The talent program will exit when the user log off the other computer.
Computer Mail: It enables users to send messages to other computers. Those who are interested in using not more than two computers will establish ‘mail file’ on the machines. It is a system by the virtue of which one can add message to the mail file of another user. In the environment of microcomputers, it offers some problems as the micro is not suitable for receiving the computer mail. On sending such mail, the mail software is programmed to open a connection to the computer whose address is described, which is turned off or not running the mail system.

The computer mail is not supported by the micro-computers, but these services are present in all implementations of TCP/IP. It is interesting to see that these traditional applications are still playing significant role in networks based on TCP/IP. It is observed that the passage of time has changed the way of using networks. The large, self-sufficient computer systems are les popular now. They are now replaced. A number of computers like mainframes, minicomputers, workstations and microcomputers are a part of today’s installations. Such computers are configured for performing exclusive tasks.

A number of people are still interested in remaining confined to just one computer system. For specialized services, the system will call on the net. The server/client model of networking services was thus initiated. A server is the one that is responsible for providing certain services to all systems on a certain network where client is a computer system that asks for the service and makes use of it. The functions of a server and of client can be performed by the same computer. There is no need to have two operating systems.

Here are the kinds of servers typically present in a modern computer setup. Note that these computer services can all be provided within the framework of TCP/IP.

These protocols are an affective part of the internet suite. The definitions of protocols are not defined as their support is widely available on their commercial installation. However, these protocols are the most effective part of the internet suite. We have listed only a number of simple services provided by the TCP/IP. In case you want to know about extensive functions performed by the TCP/IP, do inform us in the comment section. We hope that the post has helped you in building up an idea on what is TCP/IP and what are its functions.

Types of servers

Typically servers are of four types: FTP servers, proxy servers, online game servers and web servers. Server networking model or client is used by many systems together with email services and web sites. Peer to peer networking, a substitute model, makes all computers to work like servers and clients simultaneously. You can better understand a server by these examples. Name servers gives information about internet host names, FTP servers keep hold on FTP sites and provide files to does users who request for it, mail servers are responsible for delivering e-mails, web servers are bound to send web pages where list servers are programmed to administrate mailing lists.

Servers are physically like other computers where their hardware configuration is specifically optimized to make them most appropriate for their role. In many servers the hardware installed is alike the hardware of any other normal computer but the software run in the server is far more different form the software that are installed in other computers. Additional processing, storage capacity and memory are commonly configured in the network servers to improve its capacity to handle clients – other computers on the network.

The underlying hardware or software for a system that drives the server is called a server platform. Instead of operating system, the term server platform is used.

Application Servers

Application servers have lion’s share in computer territory between database servers and the end user, where servers are often connected to the two. They are often referred as middleware Middleware is that software which establishes a connection between two separate applications that are otherwise apart. A number of middleware products can link a database system to a Web server. It enables users to request data from database by the help of those forms that are displayed on Web browser and based on the users’ profile and request, allowing the Web server to return dynamic Web pages.

List Servers

To improve the management of mailing lists list servers are used despite of what is there type. Whether they are interactive debates open to the public or one-way lists that deliver newsletters, announcements or advertising.

Chat Servers

This server enables a number of people to share information in the environment of an internet newsgroup that offer real time discussion capabilities. It is used to refer to a number of different features of computer. To immediately respond to the input real-time operating systems are used.

IRC Servers

Internet Relay Chat is comprised of various independent networks of servers that allow users to connect to each other via an IRC network. It is an option for those who are seeking real time competence.

Fax Servers

Those organizations that want to reduce the incoming and outgoing telephone resources; a fax server is an ideal solution. However, there is a need to fax the actual document.

Groupware Servers

It is software that is designed to make the users able to work together, regardless of their location, through Internet or a corporate Intranet and to work together in a virtual environment.

Mail Servers

Mail server is as important as web server s and mail servers to send and store mails on the corporate networks through LANs and WANs and across the internet.

Telnet Servers

By the help of it users log on to a host computer and perform work as if they are working on isolated computer.

News Servers

They work as source of distribution and delivery for hundreds of available public news groups accessible over the USENET news network. USENET is global bulletin board system that can be approached via internet or via a variety of online services.

Proxy Servers

These servers work in-between a client programme (commonly a Web browser) and an external server (another server on web) to filter requests, improve performance, and share connections.

The role played by the servers in a networking is very significant. An out of order server can halt the interconnectivity of all computers on its network. The rise in the usage of internet in homes and office users along with the increase in corporate computer networks are responsible for boosting the development of server. Servers are used in today’s computers and we do not know what will be there developed form and of course what will be the choice of the upcoming generation. Let us wait and see how will be these serving computers molded in near future.

FIREWALL

A firewall can either be software-based or hardware-based and is used to help keep a network secure. Its primary objective is to control the incoming and outgoing network traffic by analyzing the data packets and determining whether it should be allowed through or not, based on a predetermined rule set. A network's firewall builds a bridge between an internal network that is assumed to be secure and trusted, and another network, usually an external (inter)network, such as the Internet, that is not assumed to be secure and trusted. Many personal computer operating systems include software-based firewalls to protect against threats from the public Internet. Many routers that pass data between networks contain firewall components and, conversely, many firewalls can perform basic routing functions.

Types:

There are different types of firewalls depending on where the communication is taking place, where the communication is intercepted and the state that is being traced.

Network layer or packet filters

Network layer firewalls, also called packet filters, operate at a relatively low level of the TCP/IP protocol stack, not allowing packets to pass through the firewall unless they match the established rule set. The firewall administrator may define the rules; or default rules may apply. The term "packet filter" originated in the context of BSD operating systems.

Network layer firewalls generally fall into two sub-categories, stateful and stateless. Stateful firewalls maintain context about active sessions, and use that "state information" to speed packet processing. Any existing network connection can be described by several properties, including source and destination IP address, UDP or TCP ports, and the current stage of the connection's lifetime (including session initiation, handshaking, data transfer, or completion connection). If a packet does not match an existing connection, it will be evaluated according to the ruleset for new connections. If a packet matches an existing connection based on comparison with the firewall's state table, it will be allowed to pass without further processing.

Stateless firewalls require less memory, and can be faster for simple filters that require less time to filter than to look up a session. They may also be necessary for filtering stateless network protocols that have no concept of a session. However, they cannot make more complex decisions based on what stage communications between hosts have reached.

Modern firewalls can filter traffic based on many packet attributes like source IP address, source port, destination IP address or port, destination service like WWW or FTP. They can filter based on protocols, TTL values, netblock of originator, of the source, and many other attributes.

Application-layer

Application-layer firewalls work on the application level of the TCP/IP stack (i.e., all browser traffic, or all telnet or ftp traffic), and may intercept all packets traveling to or from an application. They block other packets (usually dropping them without acknowledgment to the sender).

On inspecting all packets for improper content, firewalls can restrict or prevent outright the spread of networked computer worms and trojans. The additional inspection criteria can add extra latency to the forwarding of packets to their destination.

Application firewalls function by determining whether a process should accept any given connection. Application firewalls accomplish their function by hooking into socket calls to filter the connections between the application layer and the lower layers of the OSI model. Application firewalls that hook into socket calls are also referred to as socket filters. Application firewalls work much like a packet filter but application filters apply filtering rules (allow/block) on a per process basis instead of filtering connections on a per port basis. Generally, prompts are used to define rules for processes that have not yet received a connection. It is rare to find application firewalls not combined or used in conjunction with a packet filter.

Also, application firewalls further filter connections by examining the process ID of data packets against a ruleset for the local process involved in the data transmission. The extent of the filtering that occurs is defined by the provided ruleset. Given the variety of software that exists, application firewalls only have more complex rulesets for the standard services, such as sharing services. These per process rulesets have limited efficacy in filtering every possible association that may occur with other processes. Also, these per process ruleset cannot defend against modification of the process via exploitation, such as memory corruption exploits. Because of these limitations, application firewalls are beginning to be supplanted by a new generation of application firewalls that rely on mandatory access control (MAC), also referred to as sandboxing, to protect vulnerable services.

Proxies

A proxy server (running either on dedicated hardware or as software on a general-purpose machine) may act as a firewall by responding to input packets (connection requests, for example) in the manner of an application, while blocking other packets. A proxy server is a gateway from one network to another for a specific network application, in the sense that it functions as a proxy on behalf of the network user. Proxies make tampering with an internal system from the external network more difficult and misuse of one internal system would not necessarily cause a security breach exploitable from outside the firewall (as long as the application proxy remains intact and properly configured). Conversely, intruders may hijack a publicly reachable system and use it as a proxy for their own purposes; the proxy then masquerades as that system to other internal machines. While use of internal address spaces enhances security, crackers may still employ methods such as IP spoofing to attempt to pass packets to a target network.

Network address translation

Firewalls often have network address translation (NAT) functionality, and the hosts protected behind a firewall commonly have addresses in the "private address range", as defined in RFC 1918. Firewalls often have such functionality to hide the true address of protected hosts. Originally, the NAT function was developed to address the limited number of IPv4 routable addresses that could be used or assigned to companies or individuals as well as reduce both the amount and therefore cost of obtaining enough public addresses for every computer in an organization. Hiding the addresses of protected devices has become an increasingly important defense against network reconnaissance.

OPTICAL NETWORKING

Optical networking is defined as the types of connection between more then two networking devices with the help of fiber optical cables for the sake of computer networking and for other uses such as surfing internet, watching TV, telecommunication and file sharing technology etc is called as the optical networking. Optical networking is based on the optical networks for the purpose of the high rate connectivity in offices or at the home. There are different types of networking technologies are used for transmitting the data from one place to another but optical networking provide the fastest data transmission over the networks.

Working of the optical networking depends upon the different components used in the optical networks such as the fiber optical cables. In optical networking with the help of fiber optics user scan deliver the date between two points at higher speed with similar to that of the light. Core is the main source of transmission in the optical networking. These cores are packed in a special type of layer which maintains the light signals in it. This wrapping also prevents the light to move outward from the cores of fiber optics. This step really prevents the data losing during transmission. According to this optical networks works on large distances and facilitate the users at long areas.

Types of Optical Networking:

There are several types of optical networking but all are depends upon the optical networks. Some of the important types of optical networks are as follows

Passive Optical Networking:

A type of optical networking in which only single strand of fiber optics can take part and build a connection between the multiple computer networking clients from different areas is called as the passive optical networking. But some times customers complained that it can lower the rate of internet connection.

Synchronous Optical Networking:

Another type of optical networking that deals with the data transmission is the synchronous optical networking. In this case optical networks can monitor that all the data related to the information can pass smoothly from one place to another. It is more effective then the physical networking. It also observes the type of data; the type of data should be of one form and can be relayed properly.

Star networking:

Networking carried out with the help of star networks is called as the star networking. Star networks deals with connection between the main computer systems to the other multiple computers over the network. They are also able to enhance the performance of the connection san the network.

Benefits of Optical Networking:

As optical networking is based upon the faster cables of fiber optics, so it has many advantages in the field of data transmission from one point to another between multiple computers over network. Some common advantages are given below

They are faster as compared to other mode of transmission of data between distances. Co axial cables are also used for the data transmission purposes but they are quite slow.
Optical networks are more reliable and convenient for the users to enjoy the facility of the transmission from different places at large distances because all the data is wrapped in the core of fiber optics.
the connectivity of the optical networking is more efficient as compared to other connections between the networks

Drawbacks of Optical Networking:

Some disadvantages of the optical networking are also there over its benefits. The major disadvantages are that it is very expensive process to construct the fiber optics for optical networks and it is very difficult to join the fiber optical cables as compared to the copper cables etc.

DATA MINING

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Continuous Innovation

Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost.

Example

For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.

Data, Information, and Knowledge

Data

Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes:

operational or transactional data such as, sales, cost, inventory, payroll, and accounting

nonoperational data, such as industry sales, forecast data, and macro economic data

meta data - data about the data itself, such as logical database design or data dictionary definitions

Information

The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale transaction data can yield information on which products are selling and when.

Knowledge

Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts.

DATA WAREHOUSES

Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software are allowing users to access this data freely. The data analysis software is what supports data mining.

Uses of Data mining

Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data.

With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.

For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. American Express can suggest products to its cardholders based on analysis of their monthly expenditures.

WalMart is pioneering massive data mining to transform its supplier relationships. WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive 7.5 terabyte Teradata data warehouse. WalMart allows more than 3,500 suppliers, to access data on their products and perform data analyses. These suppliers use this data to identify customer buying patterns at the store display level. They use this information to manage local store inventory and identify new merchandising opportunities. In 1995, WalMart computers processed over 1 million complex data queries.

The National Basketball Association (NBA) is exploring a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies. For example, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one! Advanced Scout not only finds this pattern, but explains that it is interesting because it differs considerably from the average shooting percentage of 49.30% for the Cavaliers during that game.

By using the NBA universal clock, a coach can automatically bring up the video clips showing each of the jump shots attempted by Williams with Price on the floor, without needing to comb through hours of video footage. Those clips show a very successful pick-and-roll play in which Price draws the Knick's defense and then finds Williams for an open jump shot.

Data mining work

While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available: statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:

Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.

Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.

Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative mining.

Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:

Extract, transform, and load transaction data onto the data warehouse system.

Store and manage the data in a multidimensional database system.

Provide data access to business analysts and information technology professionals.

Analyze the data by application software.

Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.

Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution.

Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. CART segments a dataset by creating 2-way splits while CHAID segments using chi square tests to create multi-way splits. CART typically requires less data preparation than CHAID.

Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique.

Rule induction: The extraction of useful if-then rules from data based on statistical significance.

Data visualization: The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships.

Technological infrastructure

Today, data mining applications are available on all size systems for mainframe, client/server, and PC platforms. System prices range from several thousand dollars for the smallest applications up to $1 million a terabyte for the largest. Enterprise-wide applications generally range in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver applications exceeding 100 terabytes. There are two critical technological drivers:

Size of the database: the more data being processed and maintained, the more powerful the system required.

Query complexity: the more complex the queries and the greater the number of queries being processed, the more powerful the system required.

Relational database storage and management technology is adequate for many data mining applications less than 50 gigabytes. However, this infrastructure needs to be significantly enhanced to support larger applications. Some vendors have added extensive indexing capabilities to improve query performance. Others use new hardware architectures such as Massively Parallel Processors (MPP) to achieve order-of-magnitude improvements in query time. For example, MPP systems from NCR link hundreds of high-speed Pentium processors to achieve performance levels exceeding those of the largest supercomputers.

Data mining involves six common classes of tasks:

Anomaly detection (Outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors and require further investigation.

Association rule learning (Dependency modeling) – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.

Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.

Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam".

Regression – Attempts to find a function which models the data with the least error.

Summarization – providing a more compact representation of the data set, including visualization and report generation.

DATA CAPTURE

Data capture means to collect data from external and internal resources. If the captured data did not stored then it can be lost. Keep this data in some form like records, and most important in database is known as data entry.

An Electronic Data Capture (EDC) system is a computerized system designed for the collection of clinical data in electronic format for use mainly in human clinical trials. EDC replaces the traditional paper-based data collection methodology to streamline data collection and expedite the time to market for drugs and medical devices. EDC solutions are now widely adopted by pharmaceutical companies and clinical research organizations (CRO) around the world.

Typically, EDC systems provide:

a graphical user interface component for data entry
a validation component to check user data
a reporting tool for analysis of the collected data

EDC systems are used by life sciences organizations, broadly defined as the pharmaceutical, medical device and biotechnology industries in all aspects of clinical research, but are particularly beneficial for late-phase (phase III-IV) studies and pharmacovigilance and post-market safety surveillance, although some EDC system's such as SciAn Services edc^pro are specialized to include tools for phases I-II.

EDC can increase the data accuracy and decrease the time to collect data for studies of drugs and medical devices. The trade-off that many drug developers encounter with deploying an EDC system to support their drug development is that there is a relatively high start-up process, followed by significant benefits over the duration of the trial. As a result, for an EDC to be economical the saving over the life of the trial must be greater than the set-up costs. This is often aggravated by two conditions: 1) that initial design of the study in EDC does not facilitate the decrease in costs over the life of the study due to poor planning or inexperience with EDC deployment; and 2) initial set-up costs are higher than anticipated due to initial design of the study in EDC due to poor planning or experience with EDC deployment. The net effect is to increase both the cost and risk to the study with insignificant benefits. However, with the maturation of today’s EDC solutions, such as Oracle Remote Data Capture (RDC) and OmniComm Systems’ TrialMaster EDC, Clinion^TM much of the earlier burdens for study design and set-up have been alleviated through progressive technologies that allow for point-and-click, and drag-and-drop design modules. With little to no programming required, and reusability from global libraries and standardized forms such as CDISC’s CDASH, deploying EDC can now rival the paper processes in terms of study start-up time. As a result, even the earlier phase studies have begun to adopt EDC technology.

History

EDC is often cited as having its origins in another class of software — Remote Data Entry (RDE) that surfaced in the life sciences market in the late 1980s and early 1990s. However its origins actually begin in the mid 1970s with a contract research organization known then as Institute for Biological Research and Development (IBRD). Dr. Richard Nichol and Joe Bollert contracted with Abbott Pharmaceuticals for the IBRD 'network' of Clinical Investigators to each have a computer and 'directly' enter clinical study data to the IBRD mainframe. IBRD then cleaned the data and provided reports to Abbott.

Clinical research data—patient data collected during the investigation of a new drug or medical device is collected by physicians, nurses, and research study coordinators in medical settings (offices, hospitals, universities) throughout the world. Historically, this information was collected on paper forms which were then sent to the research sponsor (e.g., a pharmaceutical company) for data entry into a database and subsequent statistical analysis environment. However, this process had a number of shortcomings:

data are copied multiple times, which produces errors
errors that are generated are not caught until weeks later
visibility into the medical status of patients by sponsors is delayed

To address these and other concerns, RDE systems were invented so that physicians, nurses, and study coordinators could enter the data directly at the medical setting. By moving data entry out of the sponsor site and into the clinic or other facility, a number of benefits could be derived:

data checks could be implemented during data entry, preventing some errors altogether and immediately prompting for resolution of other errors
data could be transmitted nightly to sponsors, thereby improving the sponsor's ability to monitor the progress and status of the research study and its patients

These early RDE systems used "thick-client" software—software installed locally on a laptop computer's hardware—to collect the patient data. The system could then use a modem connection over an analog phone line to periodically transmit the data back to the sponsor, and to collect questions from the sponsor that the medical staff would need to answer.

Though effective, RDE brought with it several shortcomings as well. The most significant shortcoming was that hardware (e.g., a laptop computer) needed to be deployed, installed, and supported at every investigational (medical) site. In addition to being expensive for sponsors and complicated for medical staff, this model resulted in a proliferation of laptop computers at many investigational sites that participated in more than one research study simultaneously. Usability and space constraints led to a lot of dissatisfaction among medical practitioners. With the rise of the Internet in the mid 1990s, the obvious solution to some of these issues was the adoption of web-based software that could be accessed using existing computers at the investigational sites. EDC represents this new class of software.

Future of Data capture

As EDC software continues to mature, vendors are including capabilities that would have previously been developed and sold as separate software solutions: clinical data management systems (CDMS), clinical trial management systems (CTMS), business intelligence and reporting, and others. Efforts are being made to integrate payment execution tied to EDC data as well (such as Greenphire's eClinicalGPS product). This convergence is expected to continue until electronic patient medical records become more pervasive within the broader healthcare ecosystem—at which point the ideal solution would be to extract patient data directly from the electronic medical records as opposed to collecting the data in a separate data collection instrument. Standards such as CDISC and HL7 are already enabling this type of interoperability to be explored.

DATA CAPTURE METHODS

Multiple methods are available for capturing data from unstructured documents (letters, invoices, email, fax, forms etc)! The list of methods identified below is not exhaustive but it is a guide of the appropriate usage of each method when addressing business process automation projects.

As well as considering the method of data capture, due consideration of the origins of the documents(s) that need to be captured must happen, to see if the documents are available in their original electronic format which, has the potential to massively increase data capture accuracy and remove the need for printing and scanning. Methods of capture from documents in electronic format are identified below.

Whenever a method of capture is considered, it is advisable in the first instance to consider the original documents, to determine if the document or form can be updated to improve the capture/recognition process and method. Investigation of the existing line of business systems, to determine what additional metadata can be extracted for free using a single reference, can provide significant advantages!

The correct method(s) of metadata capture for a particular business process automation project, will consider all the methods identified below and the use of one or a number may be appropriate.

Manual keying

Manual keying of metadata from unstructured data is appropriate for data that is received in low volumes and results in low levels of recognition by intelligent data capture products (IDR, ICR).

Offshore keying

Offshore keying of Metadata is most appropriate for the following reasons:

High volumes of individual documents where the level of recognition achieved using intelligent data capture products is low (can include documents with a high level of handwritten data).
Potentially capturing the data that has not been successfully captured using an Intelligent data capture product.
High volume of individual documents where the data to be extracted is not consistent from page to page.
Can be very cost efficient based on the lower labour costs that can be achieved.

Single click

Single click is an Optical Character Recognition (OCR) tool that can be used to capture machine produced characters in low volume ad-hoc capture applications and populating a line of business application.

OCR (Optical Character Recognition)

OCR as a technology provides the ability to successfully capture machine produced characters in preset zones or, full page. OCR systems can recognise many different OCR fonts, as well as typewriter and computer-printed characters. Dependent upon the capabilities of the particular OCR product, this can be used to capture low to high volumes of data, where the information is in consistent location(s) on the documents.

ICR (Intelligent Character Recognition)

ICR is the computer translation of hand printed and written characters. Data is entered from hand-printed forms through a scanner, and the image of the captured data is then analysed and translated by sophisticated ICR software. ICR is similar to optical character recognition (OCR) but is a more difficult process since OCR is from printed text, as opposed to handwritten characters.

Bar code recognition

Dependent upon the type of barcode that is used, the amount of metadata that can be included is high, as is the level of recognition. The application of single or multiple bar codes to particular document types such as Proof of Delivery notes, membership forms, application forms, gift aid etc, can dramatically increase the effectiveness of a business process.

Template based intelligent capture

The level of capability is dependent upon the individual template based intelligent capture product! More advanced products are able to identify machine produced and to a lesser degree handwritten characters that are contained in particular area(s) of a document. These applications are used where the number of document types being received are relatively low (typically up to 30 different document types) but consistent. Used in applications such as census, inter-bank transfers application forms.

Intelligent Document Recognition (IDR)

The level of capability is dependent upon the individual product. These applications are used to capture metadata from documents that is rules based. For example, the product will identify post codes, logos, key words, VAT registration numbers and, through an ongoing learning process, capture information from multiple document types.

This type of capture is used for high volume invoice processing and digital mailroom applications, where the classification and indexing of incoming documents is key. IDR software applications use rules to identify and capture information from semi-structured documents. Rules, specified by end users, look for specific text on a document to identify the document type and additional rules can then be applied to each different type from then on, extracting different metadata fields from each type.

These applications are commonly used for digital mailroom environments, with the idea that documents are taken out of their envelopes and fed straight into a scanner with very little manual processing.

Specialised applications exist for departmental projects such as invoice processing. IDR applications can hold information about suppliers generated from other line-of-business systems and match invoices to that information, using recognised text such as VAT number, telephone number, post code etc. The application then looks for keyword identifiers on the invoice and extrapolates the value nearby. Validation rules are then applied, for example the NET amount plus the VAT amount must equal the gross amount, minimising the chance for errors.

Methods of capture from electronic formats

Capturing data from source (digital) documents and forms

In our experience, organisations often reduce everything to paper format before going through the process of capturing data. They often do this even when they receive the information in its original digital format. Where this is the case, it is unnecessary, time consuming and costly and often results in a lower level of success in extracting the required data.

Where information is available in its original digital format, tools such as Formate enable organisations to automate the receipt and interrogation of searchable pdf, Word docs, electronic forms, instant messaging, etc, thus capturing the required data digitally and negating the need to print and scan these documents prior to using ICR, OCR, IDR or any of the techniques identified above. As an example, invoices received via email in a searchable pdf format, can potentially have the required data automatically extracted with a high level of accuracy and no human input.

Legacy data import

Products such as Alchemy Datagrabber Module, Formate and Onbase allow organisations with legacy systems (mainframe systems) to ingest data for improved search and archival applications.

Examples include cheque requisition reports, property tax reports, invoice and credit note runs. The reports would be parsed by the application and broken down into individual records or pages. At the same time, index information is extracted from each record or page and associated with that record or page.

The full text content of the document is also made available for searching. To improve the presentation of the document to the end user, an overlay can be added. The Overlay can be a representation of the form or paper that the original report would have been printed on. Therefore, in the case of an invoice, the record resembles the original printed invoice. Datagrabber can also be used to import images, or files, along with indexing information extracted from a legacy system or from a manually created file. It can also be used to create the required structure of a database within Alchemy.

Voice Capture

The capture of pure voice records and voice forms is becoming as important for businesses as other forms of communication (email, web forms, fax). Applications such as CallXpress provide the ability to capture voice commands to initiate business processes, store voice records alongside all other forms of communication for future reference in a document management system and convert speech to text. In the case of speech to text, this provides the opportunity to utilise OCR, ICR, IDR technology to support the business needs. Contact centres provide a good example of where the combination of voice, instant messaging, email, fax and web forms will all be found supporting a common business process.

DATA ANALYSIS

Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facts and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.

Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes. Business intelligence covers data analysis that relies heavily on aggregation, focusing on business information. In statistical applications, some people divide data analysis into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data and CDA on confirming or falsifying existing hypotheses. Predictive analytics focuses on application of statistical or structural models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data. All are varieties of data analysis.

Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination. The term data analysis is sometimes used as a synonym for data modeling.

Types of Data

Data can be of several types

Quantitative data data is a number
- Often this is a continuous decimal number to a specified number of significant digits
- Sometimes it is a whole counting number
Categorical data data one of several categories
Qualitative data data is a pass/fail or the presence or lack of a characteristic

Process of Data Analysis

Data analysis is a process, within which several phases can be distinguished:

Data cleaning

Data cleaning is an important procedure during which the data are inspected, and erroneous data are—if necessary, preferable, and possible—corrected. Data cleaning can be done during the stage of data entry. If this is done, it is important that no subjective decisions are made. The guiding principle provided by Adèr (ref) is: during subsequent manipulations of the data, information should always be cumulatively retrievable. In other words, it should always be possible to undo any data set alterations. Therefore, it is important not to throw information away at any stage in the data cleaning phase. All information should be saved (i.e., when altering variables, both the original values and the new values should be kept, either in a duplicate data set or under a different variable name), and all alterations to the data set should carefully and clearly documented, for instance in a syntax or a log.

Initial data analysis

The most important distinction between the initial data analysis phase and the main analysis phase, is that during initial data analysis one refrains from any analysis that are aimed at answering the original research question. The initial data analysis phase is guided by the following four questions:

Quality of data

The quality of the data should be checked as early as possible. Data quality can be assessed in several ways, using different types of analyses: frequency counts, descriptive statistics (mean, standard deviation, median), normality (skewness, kurtosis, frequency histograms, normal probability plots), associations (correlations, scatter plots). Other initial data quality checks are:

Checks on data cleaning: have decisions influenced the distribution of the variables? The distribution of the variables before data cleaning is compared to the distribution of the variables after data cleaning to see whether data cleaning has had unwanted effects on the data.
Analysis of missing observations: are there many missing values, and are the values missing at random? The missing observations in the data are analyzed to see whether more than 25% of the values are missing, whether they are missing at random (MAR), and whether some form of imputation is needed.
Analysis of extreme observations: outlying observations in the data are analyzed to see if they seem to disturb the distribution.
Comparison and correction of differences in coding schemes: variables are compared with coding schemes of variables external to the data set, and possibly corrected if coding schemes are not comparable.
Test for common-method variance.

The choice of analyses to assess the data quality during the initial data analysis phase depends on the analyses that will be conducted in the main analysis phase.

Quality of measurements

The quality of the measurement instruments should only be checked during the initial data analysis phase when this is not the focus or research question of the study. One should check whether structure of measurement instruments corresponds to structure reported in the literature. There are two ways to assess measurement quality:

Confirmatory factor analysis
Analysis of homogeneity (internal consistency), which gives an indication of the reliability of a measurement instrument. During this analysis, one inspects the variances of the items and the scales, the Cronbach's α of the scales, and the change in the Cronbach's alpha when an item would be deleted from a scale.

Initial transformations

After assessing the quality of the data and of the measurements, one might decide to impute missing data, or to perform initial transformations of one or more variables, although this can also be done during the main analysis phase. Possible transformations of variables are:

Square root transformation (if the distribution differs moderately from normal)
Log-transformation (if the distribution differs substantially from normal)
Inverse transformation (if the distribution differs severely from normal)
Make categorical (ordinal / dichotomous) (if the distribution differs severely from normal, and no transformations help)

Characteristics of data sample

In any report or article, the structure of the sample must be accurately described. It is especially important to exactly determine the structure of the sample (and specifically the size of the subgroups) when subgroup analyses will be performed during the main analysis phase.
The characteristics of the data sample can be assessed by looking at:

Basic statistics of important variables
Scatter plots
Correlations
Cross-tabulations

Final stage of the initial data analysis

During the final stage, the findings of the initial data analysis are documented, and necessary, preferable, and possible corrective actions are taken. Also, the original plan for the main data analyses can and should be specified in more detail and/or rewritten. In order to do this, several decisions about the main data analyses can and should be made:

In the case of non-normals: should one transform variables; make variables categorical (ordinal/dichotomous); adapt the analysis method?
In the case of missing data: should one neglect or impute the missing data; which imputation technique should be used?
In the case of outliers: should one use robust analysis techniques?
In case items do not fit the scale: should one adapt the measurement instrument by omitting items, or rather ensure comparability with other (uses of the) measurement instrument(s)?
In the case of (too) small subgroups: should one drop the hypothesis about inter-group differences, or use small sample techniques, like exact tests or bootstrapping?
In case the randomization procedure seems to be defective: can and should one calculate propensity scores and include them as covariates in the main analyses?

Analyses

Several analyses can be used during the initial data analysis phase:

Univariate statistics
Bivariate associations (correlations)
Graphical techniques (scatter plots)

It is important to take the measurement levels of the variables into account for the analyses, as special statistical techniques are available for each level:

Nominal and ordinal variables
- Frequency counts (numbers and percentages)
- Associations
  - circumambulations (crosstabulations)
  - hierarchical loglinear analysis (restricted to a maximum of 8 variables)
  - loglinear analysis (to identify relevant/important variables and possible confounders)
- Exact tests or bootstrapping (in case subgroups are small)
- Computation of new variables

Continuous variables
- Distribution
  - Statistics (M, SD, variance, skewness, kurtosis)
  - Stem-and-leaf displays
  - Box plots

Main analysis

In the main analysis phase analyses aimed at answering the research question are performed as well as any other relevant analysis needed to write the first draft of the research report.

Exploratory and confirmatory approaches

In the main analysis phase either an exploratory or confirmatory approach can be adopted. Usually the approach is decided before data is collected. In an exploratory analysis no clear hypothesis is stated before analysing the data, and the data is searched for models that describe the data well. In a confirmatory analysis clear hypotheses about the data are tested.

Exploratory data analysis should be interpreted carefully. When testing multiple models at once there is a high chance on finding at least one of them to be significant, but this can be due to a type 1 error. It is important to always adjust the significance level when testing multiple models with, for example, a bonferroni correction. Also, one should not follow up an exploratory analysis with a confirmatory analysis in the same dataset. An exploratory analysis is used to find ideas for a theory, but not to test that theory as well. When a model is found exploratory in a dataset, then following up that analysis with a confirmatory analysis in the same dataset could simply mean that the results of the confirmatory analysis are due to the same type 1 error that resulted in the exploratory model in the first place. The confirmatory analysis therefore will not be more informative than the original exploratory analysis.

Stability of results

It is important to obtain some indication about how generalizable the results are. While this is hard to check, one can look at the stability of the results. Are the results reliable and reproducible? There are two main ways of doing this:

Cross-validation: By splitting the data in multiple parts we can check if analyzes (like a fitted model) based on one part of the data generalize to another part of the data as well.
Sensitivity analysis: A procedure to study the behavior of a system or model when global parameters are (systematically) varied. One way to do this is with bootstrapping.

Free software for data analysis:

ROOT - C++ data analysis framework developed at CERN
PAW - FORTRAN/C data analysis framework developed at CERN
JHepWork - Java (multi-platform) data analysis framework developed at ANL
KNIME - the Konstanz Information Miner, a user friendly and comprehensive data analytics framework.
Data Applied - an online data mining and data visualization solution.