home profile publications research teaching service awards

Thesis Topics 2022

This is an overview on topics offered for undergrad and graduate labs, Bachelor Thesis projects, and Master Thesis projects. Note that topics can be adjusted (e.g., a lab topic would require an additional paper writing when taken for a graduate lab, as compared to an undergrad lab). Contact me for further topics, or if you have your own idea that fits into the research theme of Big Array Data.

All work follows the procedures, so you may want to study these first. Programming prerequisites should be taken serious - in all cases non-trivial implementation in some language is involved. Code will regularly add functionality to our rasdaman system and, as such, be used by our project partners and the general scientific and technical community; hence code quality (including, e.g., concise tests and documentation) is an integral evaluation criterion. Generally, I appreciate not only the result, but also the way towards it - therefore, showing continuous progress, initiative, and planful work for sure is an asset. Knowledge characterized as "advantageous" means that it is not mandatory, but not bringing it along will increase workload significantly, and make deadlines tight. We reserve to not give a topic to a student if there is too much risk that a good result will not be achieved, for the student's sake.

If your report is of sufficient quality to be submitted successfully to a conference or journal for publication this will be considered a strong plus.

Note that only the topics below will be accepted for supervision, due to resource constraints.

Overview

[ Dynamic Repartitioning of Large Arrays | Benchmark WCS/WMS against GeoServer | Augmented Reality for geo-visualization using rasdaman | Leaflet Timeseries frontend support | Raster-to-Vector Conversion | Benchmark Null Mask Representation on Large Data | A Time Slider for Time Selection in Datacubes | Vector files as Datacube Query Parameter | Interval arithmetics in a Datacube Query Language ]

Dynamic Repartitioning of Large Arrays

  • topic: Large arrays - in particular: larger than main memory - are stored on disk partitioned ("tiled") into subarrays, allowing to retrieve partial arrays through "subsetting" without loading the complete array into RAM. Even some data formats, like TIFF and NetCDF, support such an internal partitioning. Array DBMSs can hide such partitioning by performing internal management. Advanced systems give support to the administrator for defining particular tiling schemes, thereby allowing to tune the storage structure to query workloads. Typically, sytems allow only regular tiling (i.e., equi-sized partitions); the most advanced system in this respect, rasdaman, supports arbitrary tile structures, defined through a storage layout sub-language. However, an initial tiling is not enough - sometimes query patterns change, and then the tiling should be re-adjusted. Obviously, this involves physical tile reshaping and copying on disk which is expensive. Optimizing it is highly desirable, therefore.
    Task on hand is to devise an algorithm which, given an existing and a target tiling pattern, performs a minimum number of copying steps to transform the stored array from the former to the latter structure. This algorithm is to be embedded in the UPDATE statement of the array query language. Both theoretical considerations and a benchmark will motivate that the result is optimal. Implementation will be done on open-source rasdaman community Array DBMS.
  • team size: 1
  • prerequisites: C++
  • classification: algorithm design, language integration
  • particularities: the query parser is implemented in flex and bison

Benchmark WCS/WMS against GeoServer

  • topic: The rasdaman datacube engine offers standards-based access to massive multi-dimensional geo raster data. GeoServer is a commonly used tool supporting vector and metadata, and also raster data - to some extent.
    Task on hand is to set up a benchmark comparing both in features both support. A rasdaman instance with sufficient spatio-temporal data exists, a GeoServer instance will need to be established and fed with data on a VM that will be provided.
  • team size: 1
  • prerequisites: Web, JavaScript, geo services
  • classification: systematic benchmark
  • particularities: needs immersion into geo raster data

Augmented Reality for geo-visualization using rasdaman

  • topic: The rasdaman services available (see these demos) offer multi-dimensional, typically spatio-temporal datacubes. Task on hand is to develop a front-end, using the standards-based APIs, which takes these data and provides a virtual-reality immersive experience using some appropriate device.
  • team size: 1
  • prerequisites: VR, geo data, API programming
  • classification: system integration
  • particularities: needs immersion into geo raster data

Leaflet Timeseries frontend support

  • topic: Leaflet is a geo web frontend to map data which recently has been enhanced with capabilities to display timeseries. The rasdaman datacube services (see these demos some of which already use Leaflet), on the other hand, offer spatio-temporal datacubes.
    Task on hand is to implement and demonstrate the timeseries capabilities of Leaflet on top of rasdaman, through standards-based interfaces; these are OGC WMS, WCS, and WCPS.
  • team size: 1
  • prerequisites: Web services, geo services
  • classification: Web API programming
  • particularities: needs immersion into geo raster data

Raster-to-Vector Conversion

  • topic: Task on hand is to add a function to the rasdaman community Array DBMS which, given a 2D raster, returns a polygon which tightly delineates values in the array. This will be the bounding box in case there are no null values. When null values appear these are considered outside the array, so boundaries need to be constructed. In general, therefore, a multipolygon will be returned. To this end, a proper format encoding needs to be added, such as Shapefile. Resulting polygons need to be optimized so that sequences of lines going the same direction are conflated into a single line. The function needs to be tested for correctness, and a benchmark needs to be established to show performance.
  • team size: 1
  • prerequisites: C++, Linux
  • classification: implementation
  • particularities: Work with flex and bison for enhancing the query language

Benchmark Null Mask Representation on Large Data

  • topic: Null values in databases are a placeholder for a value which is not available, for a number of potential reasons such as nothing measured, nothing provided (yet), no value known, etc. Internally, the fact that some value is null must be stored appropriately, and the query evaluation process must take these into account properly in all computations, such as aggregation where null values have to be disregarded. Often, a null mask is stored where one bit for each data slot indicates whether that slot is null or not. Obviously, it is of critical importance for a good performance to find efficient ways for evaluating such null masks in the course of query processing.
    Task on hand is to assess a few known techniques by building a benchmark suite and use it to evaluate each approach in a C++ environment, candidates being bool[] arrays, vector<bool>, and roaring bitmaps. The situations tested for (i) execution time and (ii) memory usage include: varying sparsity, varying number of buckets (such as 1, 10, 100, 1000), and varying the bucket size (such as 1, 5, 10 MB). Operations to be investigated include: generating a null mask given a bucket and a set of values to be considered null; unary operation on tile with null mask; binary operations on buckets in presence of null masks on each operand; NDVI as a moderately complex operation on values.
  • team size: 1
  • prerequisites: C++, Linux
  • classification: benchmarking
  • particularities: For immersion into the topic it is important to select and study benchmark papers for learning how to conduct, evaluate, and document some feature under test.

A Time Slider for Time Selection in Datacubes

  • topic: Spatio-temporal datacubes can be visualized in various ways. Typically, user-friendly area selection is only available for the spatial dimensions and not for time.
    Task is to integrate the EOX timeslider with NASA WorldWind, thereby replacing the initial implementation available currently. This integration needs to be documented in a way that allows adding it to further tools later, such as Microsoft Cesium. A special twist is that the EOX slider, which is based on the D3 framework, is programmed in CoffeeScript; this needs to be integrated in the common rasdaman environment which is JavaScript-based.
  • team size: 1
  • prerequisites: Javascript
  • classification: browser GUI tool integration
  • particularities: D3, CoffeeScript

Vector Files as Datacube Query Parameter

  • topic: OGC Web Coverage Processing Service (WCPS) is a geo datacube query language with integrated spatio-temporal semantics based on the notion of a multi-dimensional coverage which may represent a datacube. Queries can be parametrized, among others with vector polygons allowing to "cut out" abritrary regions. Currently, these vectors have to be provided in an ASCII representation called Well-Known Text (WKT). However, the most widely used format in the geo universe is not WKT, but ESRI Shapefiles, a binary format.
    Goal is to add support for the Shapefile format for vector upload in the petascope component of rasdaman, next to the existing WKT decoder. Open-source libraries for decoding exist, for example GeoTools and shapelib; one of those should be used. Appropriate tests should be established to demonstrate that the Shapefile decoder works properly.
  • team size: 1
  • prerequisites: Java, Linux
  • classification: query language enhancement
  • particularities: -

Interval arithmetics in a Datacube Query Language

  • topic: The rasdaman array query language, rasql, uses multi-dimensional intervals (mintervals) to specify cutouts from a multi-dimensional array ("datacube"). In places it would be handy to have expressions available on mintervals, such as interval union, intersection, and difference.
    Goal is to extend rasql with support for such minterval arithmetics. Appropriate tests should be established to demonstrate that the code works properly.
  • team size: 1
  • prerequisites: C++, Linux
  • classification: query language enhancement
  • particularities: only as project, not thesis
Copyright © 2004+ Peter Baumann -- -- tel. +49-173-583 7882 -- Disclaimer