
Wizinix


Parallel Computing (Parallel Processing)
In order to pull a bigger wagon, it is easier to add more horses than to grow a gigantic horse and a large number of chickens, if coordinated in strength and efficiency, will be able to do a better job than a small number of oxen. We learned to fly not by constructing a machine that flaps its wings like birds, but by applying the aerodynamic principles demonstrated by nature. Aggregate speed with which Complex calculations are carried out by neurons is tremendously high, even though individual response of neurons is too slow (in terms of milli seconds). The brain simultaneously processes incoming stimuli of differing quality, as in vision. The brain divides what it sees into four components: color, motion, shape, and depth. These are individually analyzed and then compared to stored memories, which helps the brain identify what you are viewing. The brain then combines all of these into the field of view that you see and comprehend. This nicely expresses the concept of Parallel Processing that is the ability to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition, particularly in the ability of the brain to simultaneously process incoming stimuli, and also in Parallel Computing by Computing Machines. The speed of sequential computers has been doubling every eighteen months but that speed is limited by the state of the art in integrated circuit design and manufacturing. Also Silicon based processor chips are reaching their physical limits in processing speed, as they are constrained by the speed of electricity, light and certain laws of thermodynamics.
Many applications today require more Computing Power than a traditional sequential computer can offer. Parallel Processing provides a cost-effective solution to this by increasing the number of CPUs in a computer and by adding an efficient communication system between them. The work-load can hence be shared between different processors resulting in much higher Computing Power, performance than could be achieved with traditional single processor system. Hence, high performance computing requires the use of High-end Super Computers or Massively Parallel Processing (MPP) systems containing thousands of powerful CPUs. A Massively Parallel Processor (MPP) is a single computer with many networked processors, typically more than 100 processors. In an MPP, each CPU contains its own memory and copy of the Operating System and application and each subsystem communicates with the others via a high-speed interconnect. With the help of an intelligent compiler it is possible to split a given computationally intensive task among multiple processors working simultaneously as was done by Cray and PARAM. PARAM Supercomputer has an Operating System, including PARAS microKernel, that is a tiny Operating System Core, much smaller than traditional ones to achieve the efficiency and flexibility needed. IBM's Blue Gene/P is also a Massively Parallel Supercomputer.
Parallel Processing is also called Parallel Computing. In the quest of cheaper computing alternatives, the idle time of processor cycles across network, can be effectively used due to Parallel Processing, by sophisticated distributed computing software. Parallel Computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved “in parallel". Parallel Processing makes programmes run faster because there are more engines (CPUs or Cores) running it. A given task is divided into multiple subtasks and each one of them is processed on different CPUs. While Multiprocessing is a type of processing in which two or more processors work together to execute more than one programme simultaneously, the term Multiprocessor refers to the hardware architecture that allows multiprocessing, the ability of a system to support more than one processor and/or the ability to allocate tasks between them. Multiprocessing sometimes refers to the execution of multiple concurrent software processes in a system as opposed to a single process at any one instant. A Multiprocessor is a tightly coupled computer system having two or more processing units (CPUs), sharing resources, in order to simultaneously process more than one programme simultaneously. Parallel Processing uses the ability to process "chunks" or blocks of a programme simultaneously. To do this, the computer has to have several CPU units and the software to coordinate them, or it must be able to simulate such a situation. There are several different forms of Parallel Computing: bit-level, instruction level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing.
PARAM Supercomputers
The PARAM is an acronym for PARAllel Machine. The PARAM Super Computer is a distributed memory, message passing Parallel Computer. The following are the PARAM family of Supercomputers:
Supercomputer Processor in Computer Engine Philosophy of Design
PARAM 8000 INMOS Transputer MPP
PARAM 8600 i860 MPP
PARAM 9000 Sun’s SPARC, Alpha, PowerPC(MPP) MPP and Cluster
PARAM Open Frame Sun’s UltraSPARC Cluster
Levels of Parallelism
Applications are often classified according to the frequency their subtasks need to Synchronize or Communicate with each other. An application exhibits fine-grained parallelism if its subtasks must communicate many times per second; it exhibits coarse-grained parallelism if they do not communicate many times per second, and it is embarrassingly parallel if they rarely or never Communicate. Embarrassingly parallel applications are considered the easiest to parallelize. Levels of parallelism is decided based on the lumps of code (grain size) that can be potential candidate for parallelism. Below table categorizes code granularity for parallelism:
Grain Size Code Item Parallelized by
Large/Coarse Programme-Separate heavyweight process(Task-Level) Programmer
Medium Standard One Page Function (Control-Level) Programmer
Fine Loop/ Instruction block (Data-Level) Parallelizing Compiler
Very Fine Instruction (Multiple-Instruction-Level) Processor
All of forgoing approaches have a common goal to boost processor efficiency by hiding latency. To conceal latency though, there must be another thread ready to run whenever a lengthy operation occurs. The idea is to execute concurrently two or more single-threaded application, such as compiling text formatting, database searching, and device simulation.
Available instruction-level parallelism means that particular instructions of a programme may be executed in parallel. To this end, instructions can be either assembly (machine-level) or high-level language instructions. Usually, instruction-level parallelism is understood at the machine-language (assembly-language) level. In addition, while considering instruction-level parallelism we will confine ourselves to instructions expressing more or less elementary-operations, such as an instruction prescribing the addition of two scalar operands, as opposed to multi-operation instructions like instructions implying vector- or matrix-operations.
Parallelism may also be available at the loop-level. Here subsequent loop iterations are candidates for parallel execution. However, data dependences between subsequent loop iterations, called recurrences, may restrict their parallel execution. The potential speed-up is proportional to the loop limit or in case of nested loops to the product of the limits of the nested loops. Loop-level parallelism is a promising source of parallelism.
Next, there is also parallelism available at the procedure-level in form of parallel executable procedures. The extent of parallelism exposed at this level is subject mainly to the kind of the problem solution considered.
In addition, different programmes (users) are obviously, independent from each other. Thus, parallelism is also available at the user-level (which we consider to be of coarse-grained parallelism). Multiple, independent users are a key source of parallelism occurring in computing scenarios. Evidently, in a problem solution different levels of parallelism are not exclusive but, they may coexist at the same time.
Among the four levels of parallelism, the PARAM supports medium and large grain parallelism explicitly. However instruction level of parallelism is supported by the processor used in building computer engine of the PARAM. For instance, the compute engine in PARAM 8600 is based on i860 processor having capability to execute multiple instructions concurrently.
A programmer can use PARAS programming environment for the parallelization of an application. A basic thread level and task level programming on PARAM is supported by the PARAS microKernel in the form of primitive services. Much sophisticated programming environment is built using the microKernel services in the form of subsystem. Some of the prominent and powerful subsystems built are CORE, MPI, POSIX thread, and port group communication systems.
Not all parallelization results in speed-up. Generally, as a task is split up into more and more threads, those threads spend an ever-increasing portion of their time communicating with each other. Eventually, the overhead from communication dominates the time spent solving the problem and further parallelization (that is, splitting the workload over even more threads) increases rather than decreases the amount of time required to finish. This is known as parallel slowdown, which we need to take care of urgently.


