Deep machine learning in Cosmology Chaire Galaxies et Cosmologie Abell 2218
Franรงoise Combes November, 2018
The exponential increase of data
Area of surveys
Year + IFUS
From Brinkman, Huertas-Company
Also in numerical simulations Sizes of the simulations Number of particles or Resolution elements + Physical processes
Illustris project Genel et al 2014
Number of publications on machine learning NASA-ADS, refereed publications, with key-words « Machine learning neurons » Recent explosion in the last 4 years
Error(%)
Since 2012, machines separate Cats and dogs! Russakovsky et al 2014 Image Challenge
Powerful telescopes in space and on the ground Euclid, ESA satellite: launch 2021: Main goal Dark Energy 15000deg2, 12 billions galaxies WFIRST from NASA: Dark Energy & exoplanets (~2027-30?) LSST: Large Synoptic Survey Telescope: wide field 8m in Chile 20 TB/night: All sky observed every 3 days Millions of alerts/night SKA: Square Kilometer Array: Antennas array, Australia and South Africa, Several frequencies, l= 2cm to 6m Petabytes/sec 100 Petaflops machines
5
Big data management A huge challenge, for SKA: Petabytes/sec Petaflops machines working continuously (108 PC) A few exabytes/h, dishes=10x global internet, Phased arrays =100x global internet traffic! LSST: more than half of the cost! 1-2 millions alerts per night, available in 60sec 15 Tbytes /night-- Every 3 days, all sky observed 20 000 deg2 3200 Mpixels, 10 deg2, 15s/pose Euclid: 100Gb/day But spectro @ground
LSST
6
7
Classification of Hubble-Sandage (few 100)
High-z
Elliptiques … Bulbe dominant … Spirales … Irrégulières
8
From Hubble sequence to red sequence A change of paradigm
Main parameters SFR Also SFH, dust, age, metallicity 2 formation mechanisms Critical mass 3 1010Mo
Blue Color
Color-Magnitude diagram 150 000 galaxies in the SDSS
Red
Baldry et al 2004 Schavinski et al 2014
Galaxy Zoo: citizen science
The first part received millions of classifications (since 2007) SDSS, then CANDELS, DECals, Computer images, GAMA, KIDS Now galaxy Zoo-4 (7/2017): 1 million galaxies to classify
Archives of publications (55)
Angular Momentum One of the main factors of galaxy fate In addition to environment (over-density)
Galaxy Classification methods CNN: Convolutional Neural Network
Entering into the black box: First step, first layer : 32 filters of the image Eacjhfilter is contrast-normalized individually See e.g. filters for edge-detection, how it varies with color Dieleman 2015
Convolution
Downsampling, Pooling
Dominguez-Sanchez 2018
After several layers..
Dieleman 2015
Results on Galaxy Classification Application to 600 000 galaxies from SDSS Pbulge
Pcigar
Pbulge
Pedge
Dominguez-Sanchez et al 2018
Test on numerical simulations Morphology, types, Sersic index, etc.. Comparable results, but CNN runs 104 faster then GALFIT 3.5h instead of 1sec for 1000 objects
Tucillo et al 2018
How big the sample should be? Normally, the biggest numbers are better. Especially for complex shapes to recognize (in current life, landscapes, animals, scenarios..) However galaxies are simple objects!
Huertas-Company 2018
Possible to enlarge training sets The morphology of the object should not depend on orientation Rotations and translations are possible, to enlarge the set (+crops)
Dieleman 2015
Galaxy Zoo 2
Willett et al 2013
Generative Adversarial Network Degrading the image, to train the machine
Schawinski et al 2017
GAN performance
GAN trained on 4500 SDSS galaxies ďƒ¨ promising for billions
Three cases of failures (rarity, noise)
PSF-GAN To subtract AGN point sources automatically Much faster than GALFIT and with much less errors! Point sources added artificially for Training Less sensitive to deformation of PSF
With HST at high resolution, and known PSF, the quasar is subtracted to see the underlying galaxy
Martel et al 2003
GAN versus parametric tool GALFIT
Stark et al 2018
GAN less sensitive to PSF broadening
Stark et al 2018
PSFGAN uses its knowledge of galaxies
Point source added to test Residuals compared the visual structure of galaxies helps PSFGAN for PS subtraction
Use of Fader networks Manipulating images with sliding attributes, Lample et al 2017
The image is first encoded to a latent representation. The attributes are selected and a decoder is trained to rebuild the image with other Attributes ďƒ¨ Many applications in evolutional research
Exploring galaxy evolution with generative models
Schawinski et al 2018
Test models to quench star formation What are the salient characters changing from A to B How is the evolution from the blue cloud to red sequence? M(halo), environment, sSFR, gas mass, etc.. Schawinski et al 2018
The machine selects sSFR over dust fader to make a satellite
Weak lensing & cosmological parameters The primordial fluctuations seen in the Cosmic microwave background are gaussian – only average intensity and dispersion with the power spectrum versus scale, is sufficient However, lensing produces non-gaussianities Also non-gaussianity index ďƒ¨ seed of cosmic structures, inflation Gupta et al 2018, Deep learning But with a CNN of different architecture, it is possible to do better Ribli et al 2018, CNN Wm=0.26, s8 =0.8 -----
Weak lensing and machine learning Cosmic shear on 106 galaxies CFHTLenS, DES, and KiDS-450 Future 109 galaxies, Euclid, SKA, LSST, WFIRST..
Not only 2point correlations! Haiman 2018
Entering the CNN black box In the learning phase, the convolution kernel, and the corresponding weights are progressively changed, to have the optimum loss function The network has learned to use the shapes of lensing peaks, and their gradients. ďƒ¨Ribli et al 2018 propose a new statistic: the number of peaks with a given gradient, which out-performs all previous statistics
Galaxy metallicity Goal: determine the metallicity Z = 12 +log O/H from only 3 bands Colors gri from SDSS (128x128 images), better with high resolution Excellent relation Mass-metallicity It seems that the CNN has learned a way to deduce Z from galaxy morphology 96 000 images, CNN with 34 layers
Wu & Boada 2018
Galaxy mass predicted RMSE Root Mean Square Error NMAD Normal Median Absolute Deviation (independent of outliers)
Wu & Boada 2018
Galaxy evolution model Semi-analytical models of galaxy evolution since z=4, with radial transport of gas and stars within galactic discs (axisym)
Even axisymmetric, there are so many parameters, gas content, fH2, SFR, M*, metallicity (Z*, Zg), radius, surface density, Vel etc. ďƒ¨ Neural network Forbes et al 2018
Too large numbers of parameters ďƒ¨The neural network quickly returns the physical parameters of the various models, to be compared to observations
Forbes et al 2018
Modified gravity f(R), + sterile neutrinos Discrimination between a dozen of models, through a CNN Much better results with the CNN than closest-neighbour search From a series of numerical simulations of the models and in particular their weak lensing maps Size counts! Power spectrum, peak counts and Minkowski functionals are combined into a joint feature vector, to make a classical Estimator of the statistics Training epoch
Merten et al 2018
Modified gravity, sterile neutrinos Discrimination between a dozen of models, through a CNN
Training epoch
Loss function versus training epoch
Peel et al 2018
Noise influence
Summary
Deep machine learning techniques are blooming In many astronomical domains ďƒ¨ Will be mandatory with future instruments (Euclid, SKA, ..) Galaxy classification Weak lensing, Cluster findings in cosmology IFU galaxy kinematics: Califa, Sami (103) MANGA (104), Hector(105) Modified gravity, or Dark matter, Dark Energy
GalaxyGAN, Space.ml, ETH Zürich, K. Schawinski