Skip to content
- Thinking about data
- This module introduces the idea of computational thinking, and how big data can make simple problems quite challenging to solve. We use the example of calculating the median and mean stack of a set of radio astronomy images to illustrate some of the issues you encounter when working with large datasets.
- Big data makes things slow
- In this module we explore the idea of scaling your code. Some algorithms scale well as your dataset increases, but others become impossibly slow. We look at some of the reason for this, and use the example of cross-matching astronomical catalogues to demonstrate what kind of improvements you can make.
- Querying your data
- Most large astronomy projects use databases to manage their data. In this module we introduce SQL - the language most commonly used to query databases. We use SQL to query the NASA Exoplanet database and investigate the habitability of planets in other solar systems.
- Managing your data
- This module introduces the basic principles of setting up databases. We look at how to set up new tables, and then how to combine Python and SQL to get the best out of both approaches. We use these tools to explore the life of stars in a stellar cluster.
- Learning from data: regression
- This module introduces the idea of machine learning. We look at standard methodology for running machine learning experiments, and then apply this to calculating redshifts of distant galaxies using decision trees for regression.
- Learning from data: classification
- In this final module we explore the limitations of decision tree classifiers. We then look at ensemble classifiers, using the random forest algorithm to classify images of galaxies into different types.