{ "cells": [ { "cell_type": "markdown", "id": "5963d3f1", "metadata": {}, "source": [ "# 4. BayesLOSVD and DYNAMITE\n", "\n", "This notebook will show how to run DYNAMITE orbit-based models using [BayesLOSVD](https://github.com/jfalconbarroso/BAYES-LOSVD) output. BayesLOSVD is a Python library for the non-parametric extraction of the Line-Of-Sight Velocity Distributions (LOSVDs) in galaxies. Rather than describe the LOSVD using Gauss-Hermite expansion, BayesLOSVD represents the LOSVD in a *histogram* format.\n", "\n", "1. [An example galaxy: NGC4550](#NGC4550)\n", "2. [Data preparation](#datprep)\n", "3. [A look at the BayesLOSVD data](#data_firstlook)\n", "4. [Running DYNAMITE](#run_dyn)\n", "\n", "\n", "## An example galaxy: NGC4550\n", "\n", "We will model the galaxy NGC4550, which has a complex LOSVD. The following is taken from Figure 7 of Falcón-Barroso & Martig 2020, showing LOSVDs extracted using BayesLOSVD at three regions,\n", "\n", "\n", "\n", "We see clearly bimodal LOSVDs in the galaxy outskirts, indicating a strong counter-rotating stellar component.\n", "\n", "\n", "## Data preparation\n", "\n", "We require the ``*_results.hdf5`` output file from BayesLOSVD. Two such files for NGC4550 have been kindly provided by Jesus Falcón-Barroso, and can be found in the directory `NGC4550_input`. They differ in the form of LOSVD regularisation used,\n", "- `NGC4550_SAURON-SP_results.hdf5`: using simplex regularisation\n", "- `NGC4550_SAURON-RW_results.hdf5`: using random walk regularisation\n", "\n", "These ``*_results.hdf5`` files contains LOSVD constraints for all spatial regions. For more details, and instructions on creating your own ``*_results.hdf5`` files, see the [BayesLOSVD](https://github.com/jfalconbarroso/BAYES-LOSVD) documentation.\n", "\n", "To prepare this data for use in DYNAMITE, we must\n", "\n", "1. convert kinematics to Astropy ECSV file format\n", "2. add the PSF to the kinematics file header\n", "3. create the auxillary `aperture.dat` and `bins.dat` files\n", "4. get MGE data (in Astropy ECSV format)\n", "\n", "This can be done as follows," ] }, { "cell_type": "code", "execution_count": null, "id": "psychological-secretariat", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy import stats\n", "import dynamite as dyn\n", "print('Using DYNAMITE version:', dyn.__version__)\n", " \n", "BayesLOSVD = dyn.kinematics.BayesLOSVD(weight=1.,\n", " hist_width=1,\n", " hist_center=0,\n", " hist_bins=0,\n", " type='BayesLOSVD')\n", "infile = 'NGC4550_input/NGC4550_SAURON-SP_results.hdf5'\n", "outfile = 'NGC4550_input/dynamite_input/bayes_losvd_kins.ecsv'\n", "BayesLOSVD.write_losvds_to_ecsv_format(infile, outfile=outfile)" ] }, { "cell_type": "markdown", "id": "knowing-presentation", "metadata": {}, "source": [ "We next add the PSF to the kinematics. The seeing - from Table 3 of [Emsellem et al 2004](https://academic.oup.com/mnras/article/352/3/721/1210712) - has FWHM of 2.1 arcsec. We convert this to a Gaussian sigma, and add to the header of the kinematics file," ] }, { "cell_type": "code", "execution_count": null, "id": "plain-termination", "metadata": {}, "outputs": [], "source": [ "seeing_fwhm = 2.1\n", "seeing_gauss_sigma = seeing_fwhm/2.35\n", "# add the psf to file header\n", "BayesLOSVD.add_psf_to_datafile(sigma=[seeing_gauss_sigma],\n", " weight=[1.],\n", " datafile=outfile)\n", "\n", "# re-create the BayesLOSVD object reading in the complete kinematics file\n", "BayesLOSVD = dyn.kinematics.BayesLOSVD(datafile=outfile,\n", " weight=1.,\n", " type='BayesLOSVD')" ] }, { "cell_type": "markdown", "id": "offensive-return", "metadata": {}, "source": [ "Next create the auxillary `aperture.dat` and `bins.dat` files. This will require the galaxy's position angle. Table 3 of [Emsellem et al 2004](https://academic.oup.com/mnras/article/352/3/721/1210712), gives PA = 0." ] }, { "cell_type": "code", "execution_count": null, "id": "deadly-purpose", "metadata": {}, "outputs": [], "source": [ "position_angle = 0.\n", "angle_deg = 90. - position_angle\n", "dyn_input_direc = 'NGC4550_input/dynamite_input/'\n", "BayesLOSVD.write_aperture_and_bin_files(filename=infile,\n", " angle_deg=angle_deg,\n", " aperture_filename=dyn_input_direc+'aperture.dat',\n", " bin_filename=dyn_input_direc+'bins.dat')" ] }, { "cell_type": "markdown", "id": "sunset-coach", "metadata": {}, "source": [ "Finally we need the Multi Gaussian Expansion. MGEs for Atlas3D galaxies (including NGC4550) can be found under the link for the *MGE parameters for the deconvolved r-band surface brightness* on the [Atlas3D page](http://www-astro.physics.ox.ac.uk/atlas3d/). The MGE NGC4550 is provided in ``NGC4550_input/dynamite_input/``. We have already manually copied and pasted the MGE-data into the required Astropy ECSV format.\n", "\n", "The input directory `NGC4550_input/dynamite_input/` now contains all 4 required files:\n", "- ``mge.ecsv``\n", "- ``bayes_losvd_kins.ecsv``\n", "- ``aperture.dat``\n", "- ``bins.dat``\n", "\n", "\n", "## A look at the BayesLOSVD data\n", "\n", "We can take a look at the BayesLOSVD data. The data can be found in the file `NGC4550_input/dynamite_input/bayes_losvd_kins.ecsv`, which we defined as `outfile` above. First let's read the data-files and auxiliary files, into a `dyn.kinematics.BayesLOSVD` objects," ] }, { "cell_type": "code", "execution_count": null, "id": "compatible-rates", "metadata": {}, "outputs": [], "source": [ "BayesLOSVD = dyn.kinematics.BayesLOSVD(datafile=outfile,\n", " aperturefile=dyn_input_direc+'aperture.dat',\n", " binfile=dyn_input_direc+'bins.dat',\n", " weight=1.,\n", " type='BayesLOSVD')" ] }, { "cell_type": "markdown", "id": "cosmetic-yellow", "metadata": {}, "source": [ "The table has one row per spatial (in this case Voronoi) bin, and the following columns:\n", "\n", "- binID_BayesLOSVD: the bin ID used internally in BayesLOSVD \n", "- binID_dynamite: the bin ID used internally in DYNAMITE \n", "- v: the mean velocity in the bin\n", "- sigma: the velocity dispersion in the bin\n", "- xbin/ybin : the co-ords of the bin centers\n", "- losvd: array holding the LOSVD \n", "- dlosvd: array holding LOSVD uncertainties, given by 68% credible intervals per velocity bin (assumed independent)\n", "\n", "BayesLOSVD represents the LOSVD as a sequence of weights. Details of the velocity array are stored as metadata to the data table. Details of the LOSVD velocity array are stored as metadata to the data table," ] }, { "cell_type": "code", "execution_count": null, "id": "covered-palestine", "metadata": {}, "outputs": [], "source": [ "print('Bayes-LOSVD table contains meta-data about: ', BayesLOSVD.data.meta.keys())\n", "print(' velocity spacing of bins = ', BayesLOSVD.data.meta['dv'])\n", "print(' number of spatial bins = ', BayesLOSVD.data.meta['nbins'])\n", "print(' number of velocty bins = ', BayesLOSVD.data.meta['nvbins'])\n", "print(' velocity dispersion in the bins = ', BayesLOSVD.data.meta['PSF']['sigma'])\n", "print(' etc...')" ] }, { "cell_type": "markdown", "id": "romance-train", "metadata": {}, "source": [ "To plot kinematic maps, we can get get the `map_plotter` function from the kinematics object," ] }, { "cell_type": "code", "execution_count": null, "id": "adopted-router", "metadata": {}, "outputs": [], "source": [ "map_plotter = BayesLOSVD.get_map_plotter()\n", "map_plotter(BayesLOSVD.data['v'],\n", " cmap='sauron',\n", " vmin=-100,\n", " vmax=100,\n", " colorbar=True)" ] }, { "cell_type": "markdown", "id": "innocent-soldier", "metadata": {}, "source": [ "The velocity map is not centered at 0. Let's center at the galaxy's systemic velocity. We could either provide this directly, or specify that we want the `flux_weighted` mean velocity calculated from the kinematic data itself," ] }, { "cell_type": "code", "execution_count": null, "id": "eleven-indian", "metadata": {}, "outputs": [], "source": [ "BayesLOSVD.center_v_systemic(v_systemic='flux_weighted')\n", "BayesLOSVD.save_data_table(outfile)" ] }, { "cell_type": "code", "execution_count": null, "id": "beautiful-click", "metadata": {}, "outputs": [], "source": [ "map_plotter(BayesLOSVD.data['v'],\n", " cmap='sauron',\n", " vmin=-70,\n", " vmax=70,\n", " colorbar=True)" ] }, { "cell_type": "markdown", "id": "proof-opera", "metadata": {}, "source": [ "And we can pick a specific bin, and plot the LOSVD there," ] }, { "cell_type": "code", "execution_count": null, "id": "muslim-institute", "metadata": {}, "outputs": [], "source": [ "bin_id = 105\n", "\n", "losvd_i = BayesLOSVD.data['losvd'][bin_id,:]\n", "dlosvd_i = BayesLOSVD.data['dlosvd'][bin_id,:]\n", "vcent = BayesLOSVD.data.meta['vcent']\n", "plt.plot(vcent, losvd_i, '-k', label='median')\n", "plt.plot(vcent, losvd_i+dlosvd_i, ':k', label='68% cred. ints.')\n", "plt.plot(vcent, losvd_i-dlosvd_i, ':k')\n", "plt.gca().set_xlabel('Velocity [km/s]')\n", "plt.gca().set_ylabel('LOSVD [km/s]')\n", "_ = plt.gca().set_title(f'LOSVD with 68% CIs in bin {bin_id}')\n", "plt.gca().legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "public-utilization", "metadata": {}, "source": [ "For `bin_id=105` we see that the galaxy has a bimodal LOSVD (albeit with quite large uncertainties).\n", "\n", "The following are technical notes about BayesLOSVD data. If you're mainly interested in running DYNAMITE, you can skip below to the section [*Running DYNAMITE*](#run_dyn).\n", "\n", "#### Note 1: bin ID's\n", "\n", "The differences between `binID_BayesLOSVD` and `binID_DYNAMITE` are that: \n", "- `binID_BayesLOSVD` are zero-indexed, `binID_DYNAMITE` one-indexed. This is for compatibility with Fortran code,\n", "- `binID_BayesLOSVD` may have some gaps (as some bins may be masked) while DYNAMITE assumes the binIDs increase without gaps.\n", "You shouldn't need to worry about this difference as bin IDs are handled internally, but be aware of the difference for any later analysis.\n", "\n", "#### Note 2: LOSVD normalisations\n", "\n", "BayesLOSVD samples the multi-dimensional posterior on the LOSVD weights $L_i$. The full multidimensional posterior is not saved by defaut, however. What is saved, and what is given in the table above, is the median and 68% Bayesian credible intervals for the weight in each velocity bin.\n", "\n", "One important effect of this is on LOSVD normalisation. The posterior samples created by BayesLOSVD are normnalised to 1 i.e. \n", "\n", "$$\n", "\\sum_i L_i = 1\n", "$$\n", "\n", "(this is acheieved by using the `simplex` datatype in STAN). The default results file, however, gives median values per velocity bin. These are *not* normalised the same way. Let's call these median value,\n", "\n", "$$\n", "l_i = \\mathrm{median}(L_{i})\n", "$$\n", "\n", "and look at a histogram of the sums of the $l_i$ in each spatial bin," ] }, { "cell_type": "code", "execution_count": null, "id": "twelve-minister", "metadata": {}, "outputs": [], "source": [ "plt.hist(np.sum(BayesLOSVD.data['losvd'], 1))\n", "plt.axvline(1., ls='--', color='r')\n", "plt.gca().set_xlabel('$\\sum_i l_{i}$')\n", "plt.gca().set_ylabel('$N$')\n", "plt.gca().set_xlim(0, 1.1)\n", "_ = plt.gca().set_title('LOSVD normalisation in different spatial bins')" ] }, { "cell_type": "markdown", "id": "graduate-spare", "metadata": {}, "source": [ "This peaks around 0.9 and, in some cases, is as low as 0.5. \n", "\n", "#### Note 3: calculating moments\n", "\n", "When calculating LOSVD moments, we must acccount for the fact that LOSVDs are not normalised. Let's define the *normalised LOSVD* as\n", "\n", "$$\n", "\\hat{l}_{i} = \\frac{l_{i}}{\\sum_i l_{i}}\n", "$$\n", "\n", "The quantity $\\hat{l}_{i}$ is what we have used to make point estimates of the the LOSVD mean and standard deviation, which appear in the `v` and `sigma` columns of the data-table,\n", "\n", "$$\n", "v = \\sum_i v_i * \\hat{l}_{i}\n", "$$\n", "\n", "$$\n", "\\sigma = \\left[ \\sum_i (v_i-v)^2 * \\hat{l}_{i} \\right]^{\\frac{1}{2}}\n", "$$\n", "\n", "#### Note 4: inflated velocity dispersions\n", "\n", "Let's compare the BayesLOSVD result with a Gaussian LOSVD approximation in a particular bin," ] }, { "cell_type": "code", "execution_count": null, "id": "adjustable-teacher", "metadata": {}, "outputs": [], "source": [ "bin_id = 100\n", "\n", "# plot BayesLOSVD result\n", "losvd_i = BayesLOSVD.data['losvd'][bin_id,:]\n", "dlosvd_i = BayesLOSVD.data['dlosvd'][bin_id,:]\n", "vcent = BayesLOSVD.data.meta['vcent']\n", "plt.plot(vcent, losvd_i, '-k', label='median')\n", "plt.plot(vcent, losvd_i+dlosvd_i, ':k', label='68% cred. ints.')\n", "plt.plot(vcent, losvd_i-dlosvd_i, ':k')\n", "\n", "# plot Gaussian approximation\n", "v = BayesLOSVD.data['v'][bin_id]\n", "sigma = BayesLOSVD.data['sigma'][bin_id]\n", "nrm = stats.norm(v, sigma)\n", "pdf = nrm.pdf(vcent)\n", "plt.gca().plot(vcent, pdf/np.sum(pdf), color='r', ls='--', label='Gauss. approx.')\n", "\n", "# labels\n", "plt.gca().set_xlabel('Velocity [km/s]')\n", "plt.gca().set_ylabel('LOSVD [km/s]')\n", "_ = plt.gca().set_title(f'LOSVD with 68% CIs in bin {bin_id}')\n", "plt.gca().legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "young-melissa", "metadata": {}, "source": [ "The Gaussian approximation is much wider then the median BayesLOSVD result. Why is this? Let's look at the this plot again, this time on a log-scale," ] }, { "cell_type": "code", "execution_count": null, "id": "rational-ratio", "metadata": {}, "outputs": [], "source": [ "bin_id = 100\n", "\n", "# plot BayesLOSVD result\n", "losvd_i = BayesLOSVD.data['losvd'][bin_id,:]\n", "dlosvd_i = BayesLOSVD.data['dlosvd'][bin_id,:]\n", "vcent = BayesLOSVD.data.meta['vcent']\n", "plt.plot(vcent, losvd_i, '-k', label='median')\n", "plt.plot(vcent, losvd_i+dlosvd_i, ':k', label='68% cred. ints.')\n", "plt.plot(vcent, losvd_i-dlosvd_i, ':k')\n", "\n", "# plot Gaussian approximation\n", "v = BayesLOSVD.data['v'][bin_id]\n", "sigma = BayesLOSVD.data['sigma'][bin_id]\n", "nrm = stats.norm(v, sigma)\n", "pdf = nrm.pdf(vcent)\n", "plt.gca().plot(vcent, pdf/np.sum(pdf), color='r', ls='--', label='Gauss. approx.')\n", "\n", "# labels\n", "plt.gca().set_xlabel('Velocity [km/s]')\n", "plt.gca().set_ylabel('LOSVD [km/s]')\n", "_ = plt.gca().set_title(f'LOSVD with 68% CIs in bin {bin_id}')\n", "plt.gca().legend()\n", "plt.gca().set_yscale('log')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "arbitrary-cartoon", "metadata": {}, "source": [ "We see that BayesLOSVD does not force the LOSVD to decay at large velocities. The fat tails give rise to large standard deviations. Therefore, the `sigma` values in the data table are unrepresentative of the width of the main body of the distribution.\n", "\n", "For this reason, sigma-maps for BayesLOSVD output are not recommended. By default, for BayesLOSVD kinematic maps we will only plot mean velocities. In the future, we may extend this to plot LOSVD quantiles.\n", "\n", "\n", "## Running DYNAMITE \n", "\n", "#### 1. Prepare the configuration file\n", "\n", "Next prepare the DYNAMITE congfiguration file. We've included this file `NGC4550_config.yaml` in this directory. Here, we highlight a few important entries which are specific to fitting BayesLOSVD data. Firstly, we have to specify the type of the kinematics as `BayesLOSVD`, i.e.\n", "\n", " system_components:\n", " ...\n", " stars:\n", " ...\n", " kinematics:\n", " sauron:\n", " weight: 1.0\n", " type: BayesLOSVD\n", " datafile: \"bayes_losvd_kins.ecsv\"\n", " aperturefile: \"aperture.dat\"\n", " binfile: \"bins.dat\"\n", "\n", "and ensure the `type` of the `weight_solver` is `NNLS` (since the alternative, `LegacyWeightSolver` is incompatible with BayesLOSVD data), i.e.\n", "\n", " weight_solver_settings:\n", " type: \"NNLS\"\n", " lum_intr_rel_err: 0.01\n", " sb_proj_rel_err: 0.02\n", " nnls_solver: 'scipy'\n", "\n", "Read the configuration file in," ] }, { "cell_type": "code", "execution_count": null, "id": "precious-thread", "metadata": {}, "outputs": [], "source": [ "c = dyn.config_reader.Configuration('NGC4550_config.yaml', reset_logging=True)" ] }, { "cell_type": "markdown", "id": "flying-efficiency", "metadata": {}, "source": [ "#### 2. Run the models\n", "\n", "This will run two iterations, producing 5 models in total. It should take around 7 minutes with 4 cpus." ] }, { "cell_type": "code", "execution_count": null, "id": "6e35e11e", "metadata": {}, "outputs": [], "source": [ "# delete previous output if available\n", "c.remove_existing_orblibs()\n", "c.remove_existing_all_models_file(wipe_other_files=False)\n", "c.backup_config_file(keep=3, delete_other=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "simplified-final", "metadata": { "scrolled": true }, "outputs": [], "source": [ "import time\n", "\n", "t = time.perf_counter()\n", "\n", "mod_iter = dyn.model_iterator.ModelIterator(config=c)\n", "\n", "delta_t = time.perf_counter()-t\n", "print(f'Computation time: {delta_t} seconds = {delta_t/60} minutes')" ] }, { "cell_type": "markdown", "id": "extreme-laser", "metadata": {}, "source": [ "Look at the summary table," ] }, { "cell_type": "code", "execution_count": null, "id": "banned-treaty", "metadata": { "scrolled": false }, "outputs": [], "source": [ "c.all_models.table" ] }, { "cell_type": "markdown", "id": "nutritional-consultancy", "metadata": {}, "source": [ "#### 3. Look at plots of the results\n", "\n", "Some plots have been automatically created as the models were run, stored in the directory `NGC4550_output/plots/`. Let's look at some. After running the example above, you should find versions of the following plots at the locations below.\n", "\n", "##### Kinematic map\n", "\n", "`NGC4550_output/plots/kinematic_map_sauron.png`\n", "\n", "We can also obtain the plot directly from the model iterator's `get_plots()` method (the kinematics are the return value's third element)," ] }, { "cell_type": "code", "execution_count": null, "id": "1964cc0e", "metadata": {}, "outputs": [], "source": [ "kinmap = mod_iter.get_plots()[2]\n", "print('First (and only) kinematics: ' + kinmap[0][1]) # The name of the kinematics\n", "kinmap[0][0] # The plot" ] }, { "cell_type": "markdown", "id": "c4769e97", "metadata": {}, "source": [ "The right half shows a map of the reduced chi2 of the best fitting orbit-based model to the observed LOSVD. The left half shows nine examples, which have been chosen to span the variety in the observed kinematics. We see that in some locations (e.g. 1, 2, 5, 9) the observed LOSVD is fit well, however in many others it is not. In particular, some bins (e.g. 7, 8) have very *spiky* observed LOSVDs while others (e.g. 3, 6) have peaks at the most negative velocitites which is most likely an artifact. \n", "\n", "##### Chi2 evolution\n", "\n", "`NGC4550_output/plots/kinchi2_progress_plot.png`" ] }, { "cell_type": "code", "execution_count": null, "id": "1b2e755e", "metadata": {}, "outputs": [], "source": [ "mod_iter.get_plots()[0]" ] }, { "cell_type": "markdown", "id": "5dc86152", "metadata": {}, "source": [ "##### Chi2 distribution against model parameters\n", "\n", "`NGC4550_output/plots/kinchi2_plot.png`" ] }, { "cell_type": "code", "execution_count": null, "id": "c0f59cd9", "metadata": {}, "outputs": [], "source": [ "mod_iter.get_plots()[1]" ] }, { "cell_type": "markdown", "id": "9e6c06b7", "metadata": {}, "source": [ "In this case only two parameters have been left free, and only a small handful of models have been evaluated. As more models explore more of parameter space, this plot will begin to look more interesting!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }