Abstract:
We focus on summarizing hierarchical data by adapting the well-known notion of end biased-histograms to trees. Over relational data, such histograms have been well-studied, as they have a good balance between accuracy and space requirements. Extending histograms to tree data is a non-trivial problem, due to the need to preserve and leverage structure in the output. We develop a fast greedy algorithm, and a polynomial algorithm that finds provably optimal hierarchical end-biased histograms. Preliminary experimentation demonstrates that our histograms work well in practice.