Every copy of Mathematica 7 comes with four computation processes included. More processes as well as network capabilities can be added easily. Parallel computing is an important next step in increasing technical computing performance because all computers are becoming multicore. Mathematica automatically distributes the tasks over the available processes, optimizing for the installed hardware. Integrating parallel technology has a number of key advantages over making it an add-on.
In particular, it enables software developers to rely on their clients using parallel-enabled Mathematica or Player Pro. Computable data sources, introduced in Mathematica 6, are unique and popular innovations because of the ease with which data can be utilized in Mathematica.
Mathematica 7 builds on this with major additions including the complete human genome, weather, astronomical, GIS, and geodesy data. GPGPUs: These components are fast emerging as another way to achieve acceleration using existing parallel resources in PCs and servers. Originally designed to process graphics-oriented processing tasks in parallel with general processing tasks, GPUs can now be used to handle nongraphical processing tasks, with hardware vendors aiming at systems having multiple strong GPUs to promote this approach.
Figure 1: Processes are distributed to idle resources on a local network using process virtualization.
In scenarios involving simple parallelization challenges in which the target application is highly isolated, embarrassingly parallel or close to it , and can settle for reasonable acceleration results without requiring investment in high-end infrastructure, it may be practical to develop an application-specific distributed computing implementation. The simplest example would involve running different parts of the application in parallel on separate, predefined servers.
The relative simplicity of the target application might make the development and maintenance costs involved in creating a proprietary system comparable to or even less expensive than adapting a commercial system.
Another advantage to this approach is the high level of flexibility achieved in developing a proprietary system. However, for almost any scenario beyond the most simplistic ones, developing a parallel computing implementation in-house is likely to result in costly ongoing maintenance efforts and complications in handling issues that generic systems already address, such as error handling, availability, scalability, dynamic resource allocation, management requirements, and reporting.
A computing cluster is a group of servers dedicated to sharing an application's workload. Having a dedicated computing environment such as a computing cluster eliminates the need for virtualization see the previous section on highly isolated versus environment-dependent processes and offers effective central administration of the computing cluster. The downsides of this approach are:. Cluster-based systems can be combined with high-throughput storage as well as network hardware and software to optimize performance for data-bound applications with high-end performance requirements.
Grid computing is similar to cluster computing in the sense that it involves a group of computers dedicated to solving a common problem, but differs from cluster computing by allowing a mixture of heterogeneous systems different OSs and hardware in the same grid. Grid systems also do not limit usage to a single application and enable more distributed control and administration of the systems connected to the grid.
Finally, grids allow the largest scale of distributed system architecture in terms of the number of nodes involved, with large systems sometimes reaching many thousands of interconnected nodes. Some grid systems not only utilize the combined computing power of dedicated servers, but also allow PCs and workstations to contribute spare processor cycles to the grid even while they are running other computing tasks. For example, a user writing a document using a word processing tool such as Microsoft Word could simultaneously contribute 80 to 90 percent of idle processing power to computing tasks running on the grid.
This simultaneous utilization can dramatically increase the grid's potential computing power; however, in order to achieve this, the application running on the grid requires modification to use the grid system's APIs. The more environment-dependent the application is, the more extensive the changes will be to the application code to allow it to utilize available computing power on nondedicated machines.
Grid computing systems are, in general, the distributed parallel processing offering with the most comprehensive feature set and capabilities. As such, they also tend to be quite complex in terms of required expertise, both in development efforts migrating existing code to the platform APIs and ongoing maintenance and administration efforts. It is therefore recommended to evaluate these aspects when considering a grid-based approach.
Grid systems can be commercial or open source.
Open-source systems are less expensive but tend to leave open ends scheduling, management, and physical implementation aspects that are not covered by the project, and require either in-house development or collaboration with the project development community. It is therefore important to carefully assess the total cost of ownership involved in completing the missing components in open-source systems. Several commercial grid computing products provide fuller feature sets. Grid computing products tend to be at the highest end of the price range for parallel distributed systems.
As with cluster-based systems, grid-based systems can be combined with high-end products to optimize network and storage bottlenecks. Public clouds such as Amazon's EC2 and Microsoft's Azure platform are a form of computing in which the cloud user purchases computing power from a virtualized compute farm over the Internet, as opposed to private clouds that run on computers stored on location at an organization. Payment models are flexible, allowing the user to grow and shrink in computing power according to requirements and pay only for the computing power that was used over time.
This greatly reduces the need to make long-term investments in on-site hardware and infrastructure. Public clouds have traditionally been used for business applications with an emphasis on load-balancing requirements rather than accelerating computing processes, but public cloud high-performance computing systems are gaining popularity.
One approach is to dedicate a compute cluster preinstalled with the required runtime environment and files for the distributed application. This answers the application's requirements but requires investment in dedicated servers and does not take advantage of the computing power available in existing PCs and workstations connected to the network.
It also requires maintaining the cluster and making sure it always runs an up-to-date version of the runtime and data environment. Virtualization allows servers to change the runtime environment on demand by loading a different system image each time, thereby improving manageability and increasing flexibility. However, virtual image initialization forms an additional bottleneck and, as in cluster systems, does not effectively utilize the sometimes vast amounts of idle processing power on existing computers. Some grid platforms provide APIs that, when integrated into the application code, allow the use of remote machine resources without requiring extensive preconfiguration of these machines.
In some cases, this effectively enables nondedicated machines to connect to the grid and contribute their idle processing power. However, this is applicable only in certain scenarios and in most cases requires extensive modification of the application code. William M. James Demmel, Susan L. Graham, Katherine A. Berkeley Citation: Susan L.
Sloan Research Fellow: Michael Lustig, Jonathan Shewchuk, James O'Brien, Applications of Parallel Computers CS Computational Geometry CS Numerical Simulation and Modeling.