July 17, 2013

PODS 2013

Filed under: Uncategorized — paris koutris @ 9:49 am

There has been a long time since we last posted at the UWDB blog. However, coming back from SIGMOD/PODS 2013 in New York is a good opportunity to start posting again.

Our group participated at PODS with the paper “Communication Steps for Parallel Query Processing” by Paul Beame, Paris Koutris and Dan Suciu. Here is the link for the PODS paper and here is the link for full version.

In this paper, we study the problem of computing a relational query q on a large input database of size n, using a large number p of servers. In our model, the computation is performed in synchronous rounds, where each round includes some local computation followed by global communication between all servers. Further, we restrict the amount of data each server can receive during some computation: each server can receive only O(n/p^(1-ε)) bits, where ε is a parameter in [0,1) that controls replication and we call the space exponent. For example, if ε=1, one could send all data to one server and compute the result locally in one round. On the other end, if ε=0, each server receives O(n/p) data and so the data is equally distributed between the servers.  The restriction on communication/replication creates a trade-off between the number of rounds (r) and space exponent (ε), and computing this trade-off for Conjunctive Queries is the main subject of our work.

For a single round, r=1,  we show lower bounds and upper bounds that are matching. We prove that a conjunctive query q cannot be computed with space exponent ε < 1-1/τ*(q), where τ*(q) is the fractional vertex cover number for the hypergraph of query q. Our lower bounds are particularly strong, since we make no assumption about the way information is exchanged or about the computational capabilities of the servers. Additionally, we present an algorithm for one round that matches the lower bound, i.e. works with space exponent ε >= 1-1/τ*(q), for a specific class of skew-free databases. As an application of our 1-round result, consider the “triangle” query Q(x,y,z) = R(x,y), S(y,z), T(z,x). The hypegraph of Q has a fractional vertex cover of size 3/2, so the triangle query needs space exponent at least ε=1/3 to be computed. For multiple rounds of communication, we present lower bounds in a weaker model where the routing decisions for data are tuple-based. The lower bounds for multiple rounds are, to the best of our knowledge, the first of their kind. Interestingly, we also obtain as a corollary that transitive closure of a graph needs a non-constant number of rounds to be computed, for any space exponent.

There are many exciting questions that remain open. Can we show lower bounds for multiple rounds in the case where there is no assumptions on the kind of communication? What happens in the presence of data skew? Also, what can we say about the space exponent for other problems apart from relational queries?


Blog at WordPress.com.