CS342: Algorithms for Parallel and Distributed Systems

Active Exercise 5b: Parallelizing the Relational Join

In this exercise, we continue with issues in Parallel Database Systems, studying

the problem of parallelizing the join operation.

The Join: For those of you who have not studied relational databases, we present here

a very informal discussion of everything you need to know for this assignment.

Think of a relation as a collection of tuples. So, for example, suppose

a bank had records on its account holders. Each record consisted of several fields:

  1. Customer ID Number.
  2. Customer Age
  3. Customer Checking Balance
  4. Customer Address

The relation that captured this information would be a collection of these tuples.

Often you want to join two relations together. If you have two relations

R1 and R2, you can join R1 and R2 on a field f1 in R1 and f2 in R2.

The result is a new relation R3 whose tuples/records are the union of records

in R1 and R2; for every record r1 in R1 and r2 in R2 such that the f1 field of

r1 is equal to the f2 field of r2, there is a new record in R3 that is the union

of r1 and r2.

The Parallel Problem: Consider the situation in which you have two relations,

R1 and R2, that are spread out over P processors; e.g. each processor contains

1/P of the records of R1 and of R2. For simplicity we'll assume that all the

values in each relation are integers. The goal of this exercise is for you to

implement an algorithm that will execute an efficient parallel join on any

pair of fields in the two relations.

Data: You will find in /data on the local (non-ebbets) workstations two files

rel1 and rel2. Each has four integer fields; it is the first field of rel1

on which you should carry out the sorting exercise.

Your report on this project should of course discuss the issues in a parallel

implementation and explain your algorithmic choices, and give results

of experiments in a scientific fashion.


CS342 || Introduction || Laboratory || Syllbus || Homeworks || Active Assignment || Polytechnic || CIS Dept.
webmaster@ebbets.poly.edu