Can analytics take us all the way to outer space?

With large telescopes and new, increasingly sensitive instruments we can observe and survey the universe in more detail than ever before. As is the case in so many other fields of science and business, astronomy has been fundamentally affected by the digitalization and exponential growth of data volumes over the past decades. This development has led to a situation in which the volume of astronomical data is estimeted to double each year, and that this trend is expected to continue in the same direction. This big data has increased the importance of advanced analytical and statistical methods in the data analysis.

One illustrative example is provided by the research into galaxy clusters. Galaxy clusters are the largest structures in the universe. Their diameters can measure millions of light-years and their mass can be over a million billion times that of the sun’s mass. As galaxy clusters are the largest structures in the universe, we can measure the properties of the entire universe and, for instance, dark energy, by studying their number and development history. To do this we need to observe as many galaxy clusters as possible and determine their mass as accurately as possible.

The best way to find galaxy clusters is with the help of the X-ray radiation emitted by the hot intergalactic gas. This radiation can be observed with X-ray telescopes located in space. Some thousands of galaxy clusters have already been identified in this way. It is estimated that in the next few years this figure will grow to around 100,000, which is a significant fraction of the estimated number of galaxy clusters in the whole observable universe. This would make it possible to map the majority of the largest building blocks in the universe. However, a bottleneck for the cosmological studies mentioned above is the challenging nature of measuring mass: accurate masses can be measured for only a few thousand targets using the gravitational lens phenomenon.

However, it is possible to create a model which can be used to predict the mass of a galaxy cluster based on the absolute brightness of its X-ray radiation. The absolute brightness or luminosity can be determined once the target has been observed and its distance from the earth has been measured.

Slightly simplified, it is possible to theoretically prove that there is linear dependency between the logarithm of the mass and the logarithm of the absolute X-ray brightness. On the basis of this it is possible to create a predictive model using weighted linear regression between the logarithms of mass and absolute brightness. Here, 75 galaxy clusters were used as example data. The model shows that there is a strong dependency between brightness and mass, and the accuracy of the model is consistent with the theory. Therefore, the model can be used to predict the mass of the galaxy cluster using its brightness. An analysis fulfilling all scientific standards would require, for example, a more detailed analysis of measurement errors in the data and selection bias of the targets.

This example shows that SPSS Modeler used by Houston Analytics and predictive analytics can be used to solve a huge range of different problems. Houston Analytics also offers analytics and data-driven management solutions for more everyday application areas than the ones presented here. areas than the ones presented here.