Resokit.datasets Tutorial
This tutorial is a guide to use ResoKit.datasets package. ResoKit.datasets provides tools for loading the available datasets:
In this tutorial we show, step by step, how to get started with the ResoKit.datasets package.
Import ResoKit and the necessary packages
[1]:
import resokit.datasets as datasets
import matplotlib.pyplot as plt
Load the dataset
Let’s start with EU catalogue.
First, download the dataset (if not downloaded yet)
[2]:
datasets.download('eu', overwrite=False, soft=True)
Checking local dataset from which='eu' source...
File from which='eu' source to check if outdated not found.
Downloading data from https://exoplanet.eu/catalog/csv/...
Note: Progress is shown at every 0.15 MB
Data downloaded successfully. (3.39 MB)
Creating the ZIP archive /home/egianuzzi/.resokit_data/exoplanet_eu.zip...
Written exoplanet_eu.csv to /home/egianuzzi/.resokit_data/exoplanet_eu.zip.
Updated stored index in memory.
Stored dataset in memory.
[2]:
PosixPath('/home/egianuzzi/.resokit_data/exoplanet_eu.zip/exoplanet_eu.csv')
Now, load it.
[3]:
datasets.load('eu')
Loaded full dataset from memory stored datasets.
[3]:
| name | mass | mass_err_min | mass_err_max | mass_sin_i | mass_sin_i_err_min | mass_sin_i_err_max | radius | radius_err_min | radius_err_max | ... | star_dist_err_max | star_mass | star_mass_err_min | star_mass_err_max | star_radius | star_radius_err_min | star_radius_err_max | n | n_err_min | n_err_max | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 109 Psc b | 5.743 | 0.28900 | 1.01100 | 6.3830 | 0.07800 | 0.07800 | 1.152 | NaN | NaN | ... | 0.88000 | 1.13 | 0.030 | 0.030 | 1.79000 | 0.1700 | 0.1700 | 0.005843 | 7.853982e+00 | 8.975979e+00 |
| 1 | 112 Psc b | NaN | 0.00500 | 0.00400 | 0.0330 | 0.00500 | 0.00400 | NaN | NaN | NaN | ... | 0.10695 | 1.10 | 0.133 | 0.133 | 1.80100 | 0.0725 | 0.0725 | 1.427997 | 3.141593e+04 | 1.570796e+04 |
| 2 | 112 Psc c | 9.866 | 1.78100 | 3.19000 | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.10695 | 1.10 | 0.133 | 0.133 | 1.80100 | 0.0725 | 0.0725 | 0.000173 | 6.460195e-04 | 1.040366e-03 |
| 3 | 11 Com Ab | NaN | 1.53491 | 1.53491 | 16.1284 | 1.53491 | 1.53491 | NaN | NaN | NaN | ... | 10.50000 | 2.70 | 0.300 | 0.300 | 19.00000 | 2.0000 | 2.0000 | 0.019272 | 1.963495e+01 | 1.963495e+01 |
| 4 | 11 UMi b | NaN | 1.10000 | 1.10000 | 11.0873 | 1.10000 | 1.10000 | NaN | NaN | NaN | ... | 6.90000 | 1.80 | 0.250 | 0.250 | 24.08000 | 1.8400 | 1.8400 | 0.012172 | 1.933288e+00 | 1.933288e+00 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7599 | ZTF J1406+1222 Ab | 50.000 | NaN | NaN | NaN | NaN | NaN | 0.292 | NaN | NaN | ... | 200.00000 | 1.40 | NaN | NaN | 0.00002 | NaN | NaN | 116.233408 | NaN | NaN |
| 7600 | ZTF J1622+47 b | 61.000 | 19.00000 | 19.00000 | NaN | NaN | NaN | 0.980 | 0.02 | 0.02 | ... | NaN | 0.47 | NaN | NaN | 0.18200 | 0.0040 | 0.0040 | 90.031428 | NaN | NaN |
| 7601 | ZTF J1637+49 b | 23.000 | 8.00000 | 8.00000 | NaN | NaN | NaN | 0.680 | 0.07 | 0.07 | ... | 8.00000 | 0.90 | 0.050 | 0.050 | 0.00900 | 0.0010 | 0.0010 | 146.120589 | NaN | NaN |
| 7602 | ZTF J1828+2308 b | 19.500 | 0.80000 | 0.80000 | NaN | NaN | NaN | 1.020 | 0.02 | 0.02 | ... | 5.00000 | 0.61 | 0.040 | 0.040 | 0.01310 | 0.0002 | 0.0002 | 56.096513 | 7.222052e+06 | 7.222052e+06 |
| 7603 | ZTF J2252-05 b | 26.000 | 8.00000 | 8.00000 | NaN | NaN | NaN | 0.490 | 0.04 | 0.04 | ... | 82.00000 | 0.76 | 0.050 | 0.050 | 0.01000 | 0.0010 | 0.0010 | 261.799388 | NaN | NaN |
Now, the dataset is stored into memory, so it can be accessed faster than reading the file it again.
[4]:
eu = datasets.load('eu')
eu
Loaded full dataset from memory stored datasets.
[4]:
| name | mass | mass_err_min | mass_err_max | mass_sin_i | mass_sin_i_err_min | mass_sin_i_err_max | radius | radius_err_min | radius_err_max | ... | star_dist_err_max | star_mass | star_mass_err_min | star_mass_err_max | star_radius | star_radius_err_min | star_radius_err_max | n | n_err_min | n_err_max | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 109 Psc b | 5.743 | 0.28900 | 1.01100 | 6.3830 | 0.07800 | 0.07800 | 1.152 | NaN | NaN | ... | 0.88000 | 1.13 | 0.030 | 0.030 | 1.79000 | 0.1700 | 0.1700 | 0.005843 | 7.853982e+00 | 8.975979e+00 |
| 1 | 112 Psc b | NaN | 0.00500 | 0.00400 | 0.0330 | 0.00500 | 0.00400 | NaN | NaN | NaN | ... | 0.10695 | 1.10 | 0.133 | 0.133 | 1.80100 | 0.0725 | 0.0725 | 1.427997 | 3.141593e+04 | 1.570796e+04 |
| 2 | 112 Psc c | 9.866 | 1.78100 | 3.19000 | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.10695 | 1.10 | 0.133 | 0.133 | 1.80100 | 0.0725 | 0.0725 | 0.000173 | 6.460195e-04 | 1.040366e-03 |
| 3 | 11 Com Ab | NaN | 1.53491 | 1.53491 | 16.1284 | 1.53491 | 1.53491 | NaN | NaN | NaN | ... | 10.50000 | 2.70 | 0.300 | 0.300 | 19.00000 | 2.0000 | 2.0000 | 0.019272 | 1.963495e+01 | 1.963495e+01 |
| 4 | 11 UMi b | NaN | 1.10000 | 1.10000 | 11.0873 | 1.10000 | 1.10000 | NaN | NaN | NaN | ... | 6.90000 | 1.80 | 0.250 | 0.250 | 24.08000 | 1.8400 | 1.8400 | 0.012172 | 1.933288e+00 | 1.933288e+00 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7599 | ZTF J1406+1222 Ab | 50.000 | NaN | NaN | NaN | NaN | NaN | 0.292 | NaN | NaN | ... | 200.00000 | 1.40 | NaN | NaN | 0.00002 | NaN | NaN | 116.233408 | NaN | NaN |
| 7600 | ZTF J1622+47 b | 61.000 | 19.00000 | 19.00000 | NaN | NaN | NaN | 0.980 | 0.02 | 0.02 | ... | NaN | 0.47 | NaN | NaN | 0.18200 | 0.0040 | 0.0040 | 90.031428 | NaN | NaN |
| 7601 | ZTF J1637+49 b | 23.000 | 8.00000 | 8.00000 | NaN | NaN | NaN | 0.680 | 0.07 | 0.07 | ... | 8.00000 | 0.90 | 0.050 | 0.050 | 0.00900 | 0.0010 | 0.0010 | 146.120589 | NaN | NaN |
| 7602 | ZTF J1828+2308 b | 19.500 | 0.80000 | 0.80000 | NaN | NaN | NaN | 1.020 | 0.02 | 0.02 | ... | 5.00000 | 0.61 | 0.040 | 0.040 | 0.01310 | 0.0002 | 0.0002 | 56.096513 | 7.222052e+06 | 7.222052e+06 |
| 7603 | ZTF J2252-05 b | 26.000 | 8.00000 | 8.00000 | NaN | NaN | NaN | 0.490 | 0.04 | 0.04 | ... | 82.00000 | 0.76 | 0.050 | 0.050 | 0.01000 | 0.0010 | 0.0010 | 261.799388 | NaN | NaN |
Let’s load NASA dataset too.
[5]:
datasets.download('nasa', overwrite=False, soft=True)
nasa_full = datasets.load("nasa")
nasa_full
Zip file /home/egianuzzi/.resokit_data/nasa_exoplanets.zip already exists. Set overwrite=True to force the download.
Loading the entire dataset...
Reading nasa.csv directly from /home/egianuzzi/.resokit_data/nasa_exoplanets.zip...
Updated stored index in memory.
Stored dataset in memory.
[5]:
| name | star_name | default_set | reference | disc_year | disc_method | P | P_err_min | P_err_max | w_err_min | ... | star_radius_err_min | star_radius_err_max | n_stars | n_planets | star_dist | star_dist_err_min | star_dist_err_max | n | n_err_min | n_err_max | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Kepler-6 b | Kepler-6 | 0 | <a refstr=Q1_Q12_KOI_TABLE href=https://exopla... | 2009 | Transit | 3.234699 | 1.130000e-07 | 1.130000e-07 | NaN | ... | 0.093000 | 0.099000 | 1 | 1 | 587.039 | 5.071 | 4.986 | 1.942433 | 5.560341e+07 | 5.560341e+07 |
| 1 | Kepler-6 b | Kepler-6 | 1 | <a refstr=ESTEVES_ET_AL__2015 href=https://ui.... | 2009 | Transit | 3.234700 | 4.000000e-07 | 4.000000e-07 | NaN | ... | 0.017000 | 0.034000 | 1 | 1 | 587.039 | 5.071 | 4.986 | 1.942432 | 1.570796e+07 | 1.570796e+07 |
| 2 | Kepler-6 b | Kepler-6 | 0 | <a refstr=Q1_Q17_DR24_KOI_TABLE href=https://e... | 2009 | Transit | 3.234699 | 1.130000e-07 | 1.130000e-07 | NaN | ... | 0.093000 | 0.099000 | 1 | 1 | 587.039 | 5.071 | 4.986 | 1.942433 | 5.560341e+07 | 5.560341e+07 |
| 3 | Kepler-6 b | Kepler-6 | 0 | <a refstr=EXOFOP_TESS_TOI href=https://exofop.... | 2009 | Transit | 3.234694 | 6.012800e-06 | 6.012800e-06 | NaN | ... | 0.059593 | 0.059593 | 1 | 1 | 587.039 | 5.071 | 4.986 | 1.942436 | 1.044968e+06 | 1.044968e+06 |
| 4 | Kepler-6 b | Kepler-6 | 0 | <a refstr=HOLCZER_ET_AL__2016 href=https://ui.... | 2009 | Transit | 3.234699 | 3.000000e-08 | 3.000000e-08 | NaN | ... | 0.072914 | 0.046272 | 1 | 1 | 587.039 | 5.071 | 4.986 | 1.942433 | 2.094395e+08 | 2.094395e+08 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 38495 | Kepler-33 f | Kepler-33 | 0 | <a refstr=HADDEN__AMP__LITHWICK_2016 href=http... | 2011 | Transit | 41.029000 | NaN | NaN | NaN | ... | 0.260000 | 0.420000 | 1 | 5 | 1209.160 | 22.195 | 22.195 | 0.153140 | NaN | NaN |
| 38496 | Kepler-33 f | Kepler-33 | 0 | <a refstr=FULTON__AMP__PETIGURA_2018 href=http... | 2011 | Transit | 41.028064 | 1.385000e-04 | 1.385000e-04 | NaN | ... | 0.047000 | 0.045000 | 1 | 5 | 1209.160 | 22.195 | 22.195 | 0.153144 | 4.536596e+04 | 4.536596e+04 |
| 38497 | Kepler-33 f | Kepler-33 | 0 | <a refstr=HOLCZER_ET_AL__2016 href=https://ui.... | 2011 | Transit | 41.028139 | 9.930000e-06 | 9.930000e-06 | NaN | ... | NaN | NaN | 1 | 5 | 1209.160 | 22.195 | 22.195 | 0.153143 | 6.327478e+05 | 6.327478e+05 |
| 38498 | Kepler-33 f | Kepler-33 | 0 | <a refstr=BERGER_ET_AL__2018 href=https://ui.a... | 2011 | Transit | NaN | NaN | NaN | NaN | ... | 0.076000 | 0.073000 | 1 | 5 | 1209.160 | 22.195 | 22.195 | NaN | NaN | NaN |
| 38499 | Kepler-33 f | Kepler-33 | 0 | <a refstr=Q1_Q12_KOI_TABLE href=https://exopla... | 2011 | Transit | 41.028064 | 1.385000e-04 | 1.385000e-04 | NaN | ... | 0.399000 | 0.381000 | 1 | 5 | 1209.160 | 22.195 | 22.195 | 0.153144 | 4.536596e+04 | 4.536596e+04 |
Nasa dataset contains all exoplanet solutions (including non default and controversial), so we filter them here
[6]:
nasa = nasa_full[(nasa_full.controversial == 0) & (nasa_full.default_set == 1)]
nasa
[6]:
| name | star_name | default_set | reference | disc_year | disc_method | P | P_err_min | P_err_max | w_err_min | ... | star_radius_err_min | star_radius_err_max | n_stars | n_planets | star_dist | star_dist_err_min | star_dist_err_max | n | n_err_min | n_err_max | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Kepler-6 b | Kepler-6 | 1 | <a refstr=ESTEVES_ET_AL__2015 href=https://ui.... | 2009 | Transit | 3.234700 | 4.000000e-07 | 4.000000e-07 | NaN | ... | 0.017 | 0.034 | 1 | 1 | 587.039 | 5.0710 | 4.9860 | 1.942432 | 1.570796e+07 | 1.570796e+07 |
| 25 | Kepler-491 b | Kepler-491 | 1 | <a refstr=MORTON_ET_AL__2016 href=https://ui.a... | 2016 | Transit | 4.225385 | 2.680000e-07 | 2.680000e-07 | NaN | ... | 0.092 | 0.059 | 1 | 1 | 630.785 | 6.4310 | 6.3050 | 1.487009 | 2.344472e+07 | 2.344472e+07 |
| 34 | Kepler-257 b | Kepler-257 | 1 | <a refstr=ROWE_ET_AL__2014 href=https://ui.ads... | 2014 | Transit | 2.382667 | 6.000000e-06 | 6.000000e-06 | NaN | ... | 0.518 | 0.518 | 1 | 3 | 780.256 | 17.5400 | 16.7990 | 2.637039 | 1.047198e+06 | 1.047198e+06 |
| 43 | Kepler-216 b | Kepler-216 | 1 | <a refstr=ROWE_ET_AL__2014 href=https://ui.ads... | 2014 | Transit | 7.693641 | 3.100000e-05 | 3.100000e-05 | NaN | ... | 0.243 | 0.243 | 1 | 2 | 1187.470 | 21.4200 | 21.4200 | 0.816673 | 2.026834e+05 | 2.026834e+05 |
| 67 | Kepler-32 c | Kepler-32 | 1 | <a refstr=FABRYCKY_ET_AL__2012 href=https://ui... | 2011 | Transit | 8.752200 | 3.000000e-04 | 3.000000e-04 | NaN | ... | 0.040 | 0.040 | 1 | 5 | 323.847 | 3.4025 | 3.4025 | 0.717898 | 2.094395e+04 | 2.094395e+04 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 38444 | Kepler-561 c | Kepler-561 | 1 | <a refstr=MORTON_ET_AL__2016 href=https://ui.a... | 2016 | Transit | 5.350162 | 4.435000e-06 | 4.435000e-06 | NaN | ... | 0.164 | 0.082 | 1 | 2 | 621.378 | 9.3350 | 9.3350 | 1.174392 | 1.416727e+06 | 1.416727e+06 |
| 38458 | Kepler-215 b | Kepler-215 | 1 | <a refstr=ROWE_ET_AL__2014 href=https://ui.ads... | 2014 | Transit | 9.360672 | 4.000000e-05 | 4.000000e-05 | NaN | ... | 0.236 | 0.236 | 1 | 4 | 485.997 | 3.6540 | 3.6540 | 0.671232 | 1.570796e+05 | 1.570796e+05 |
| 38462 | Kepler-62 b | Kepler-62 | 1 | <a refstr=BORUCKI_ET_AL__2013 href=https://ui.... | 2013 | Transit | 5.714932 | 9.000000e-06 | 9.000000e-06 | NaN | ... | 0.020 | 0.020 | 1 | 5 | 300.874 | 1.2190 | 1.2190 | 1.099433 | 6.981317e+05 | 6.981317e+05 |
| 38477 | Kepler-192 c | Kepler-192 | 1 | <a refstr=ROWE_ET_AL__2014 href=https://ui.ads... | 2014 | Transit | 21.223400 | 7.600000e-05 | 7.600000e-05 | NaN | ... | 0.164 | 0.164 | 1 | 3 | 651.674 | 6.7435 | 6.7435 | 0.296050 | 8.267349e+04 | 8.267349e+04 |
| 38486 | Kepler-33 f | Kepler-33 | 1 | <a refstr=LISSAUER_ET_AL__2012 href=https://ui... | 2011 | Transit | 41.029020 | 4.200000e-04 | 4.200000e-04 | NaN | ... | 0.140 | 0.180 | 1 | 5 | 1209.160 | 22.1950 | 22.1950 | 0.153140 | 1.495997e+04 | 1.495997e+04 |
We can see both datasets have different amount of exoplanets:
[7]:
print(f"# of planets in NASA dataset: {nasa.shape[0]}")
print(f"# of planets in EU dataset: {eu.shape[0]}")
# of planets in NASA dataset: 5887
# of planets in EU dataset: 7604
Inspection
Checking the columns, we find that both data frames have almost all equal column names
[8]:
print(f"{'Column':^19s} || {'NASA':^4s} | {'EU':^4s}")
print("-" * 50)
for col in nasa.columns.sort_values():
print(f"{col:>19s} || {'YES':^4s} | {'YES' if col in eu.columns else 'NO':^4s}")
Column || NASA | EU
--------------------------------------------------
P || YES | YES
P_err_max || YES | YES
P_err_min || YES | YES
a || YES | YES
a_err_max || YES | YES
a_err_min || YES | YES
controversial || YES | NO
default_set || YES | NO
disc_method || YES | YES
disc_year || YES | YES
e || YES | YES
e_err_max || YES | YES
e_err_min || YES | YES
inc || YES | YES
inc_err_max || YES | YES
inc_err_min || YES | YES
mass || YES | YES
mass_err_max || YES | YES
mass_err_min || YES | YES
mass_sin_i || YES | YES
mass_sin_i_err_max || YES | YES
mass_sin_i_err_min || YES | YES
n || YES | YES
n_err_max || YES | YES
n_err_min || YES | YES
n_planets || YES | NO
n_stars || YES | NO
name || YES | YES
radius || YES | YES
radius_err_max || YES | YES
radius_err_min || YES | YES
reference || YES | YES
rowupdate || YES | YES
star_dist || YES | YES
star_dist_err_max || YES | YES
star_dist_err_min || YES | YES
star_mass || YES | YES
star_mass_err_max || YES | YES
star_mass_err_min || YES | YES
star_name || YES | YES
star_radius || YES | YES
star_radius_err_max || YES | YES
star_radius_err_min || YES | YES
tperi || YES | YES
tperi_err_max || YES | YES
tperi_err_min || YES | YES
w || YES | YES
w_err_max || YES | YES
w_err_min || YES | YES
The only differences are:
controversial : The exoplanet is confirmed (0) or not yet (1).
default_set : The solution is the default set (1) or not (0).
n_planets : Amount of planets in that system.
n_stars : Amount of stars in that system.
We can see both datasets are similar, but not equal.
[9]:
plt.figure(figsize=(5,4), dpi=130)
plt.plot(eu.mass, eu.radius, ".", ms=10, label="EU")
plt.plot(nasa.mass, nasa.radius, "x", ms=3, label="NASA")
plt.semilogx()
plt.semilogy()
plt.xlabel("Planet mass [M$_\\mathrm{J}$]")
plt.ylabel("Planet radius [R$_\\mathrm{J}$]")
plt.suptitle("Mass vs Radius comparison")
plt.xlim(1e-4, 1e2)
plt.ylim(bottom=1e-2)
plt.legend()
plt.tight_layout()
plt.show()
[10]:
plt.figure(figsize=(5,4), dpi=130)
plt.plot(eu.a, eu.e, ".", ms=10, label="EU")
plt.plot(nasa.a, nasa.e, "x", ms=3, label="NASA")
plt.semilogx()
plt.semilogy()
plt.xlabel("Planet semimajor-axis [AU]")
plt.ylabel("Planet eccentricity")
plt.suptitle("$a$ vs $e$ comparison")
plt.ylim(bottom=1e-5)
plt.legend()
plt.tight_layout()
plt.show()
Slicing
In case we need just an specific group of systems / exoplanets, we can access the ResokitDataSet as is were a pandas DataFrame; and slice it.
For example, if we want to gete just the Hot Jupiter like exoplanets (\(m \in [0.36,\, 11.8]\,M_J\) and \(P \in [1.3,\, 111]\,days\)), we simply do:
[11]:
hot_jup = eu[(0.36 < eu.mass) & (eu.mass < 11.8) & (1.3 < eu.P) & (eu.P < 111)]
hot_jup
[11]:
| name | mass | mass_err_min | mass_err_max | mass_sin_i | mass_sin_i_err_min | mass_sin_i_err_max | radius | radius_err_min | radius_err_max | ... | star_dist_err_max | star_mass | star_mass_err_min | star_mass_err_max | star_radius | star_radius_err_min | star_radius_err_max | n | n_err_min | n_err_max | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 314 | 51 Peg b | 0.470 | 0.070 | 0.030 | 0.460 | 0.060 | 0.010 | 1.900 | 0.300 | 0.300 | ... | 0.02970 | 1.030 | 0.126 | 0.126 | 1.176 | 0.0510 | 0.0510 | 1.485106 | 1.684929e+05 | 1.684929e+05 |
| 318 | 55 Cnc Ab | 0.840 | 0.031 | 0.230 | 0.840 | 0.031 | 0.131 | NaN | NaN | NaN | ... | 0.01235 | 0.900 | 0.115 | 0.115 | 0.963 | 0.0654 | 0.0654 | 0.428794 | 6.981317e+03 | 6.613879e+03 |
| 488 | CoRoT-10 b | 2.750 | 0.140 | 0.140 | NaN | NaN | NaN | 0.970 | 0.050 | 0.050 | ... | 50.00000 | 0.890 | 0.050 | 0.050 | 0.790 | 0.0500 | 0.0500 | 0.474539 | 3.141593e+04 | 3.141593e+04 |
| 489 | CoRoT-11 b | 2.330 | 0.270 | 0.270 | NaN | NaN | NaN | 1.430 | 0.033 | 0.033 | ... | 30.00000 | 1.270 | 0.050 | 0.050 | 1.360 | 0.1300 | 0.1300 | 2.098365 | 2.991993e+05 | 2.991993e+05 |
| 490 | CoRoT-12 b | 0.917 | 0.065 | 0.070 | NaN | NaN | NaN | 1.440 | 0.130 | 0.130 | ... | 85.00000 | 1.078 | 0.072 | 0.072 | 1.116 | 0.0920 | 0.0920 | 2.221736 | 5.235988e+06 | 5.235988e+06 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 7578 | XO-3 Ab | 11.790 | 0.590 | 0.590 | NaN | NaN | NaN | 1.217 | 0.073 | 0.073 | ... | 23.00000 | 1.410 | 0.080 | 0.080 | 1.490 | 0.0800 | 0.0800 | 1.968710 | 2.731820e+04 | 2.731820e+04 |
| 7579 | XO-4 b | 1.616 | 0.100 | 0.101 | 1.615 | 0.100 | 0.100 | 1.317 | 0.029 | 0.040 | ... | 19.00000 | 1.320 | 0.020 | 0.020 | 1.550 | 0.0500 | 0.0500 | 1.523296 | 1.030030e+04 | 1.336848e+04 |
| 7580 | XO-5 b | 1.077 | 0.037 | 0.037 | NaN | NaN | NaN | 1.030 | 0.050 | 0.050 | ... | 13.00000 | 0.880 | 0.030 | 0.030 | 1.060 | 0.0500 | 0.0500 | 1.500371 | 3.695991e+06 | 3.695991e+06 |
| 7581 | XO-6 b | 4.470 | 0.120 | 0.120 | NaN | NaN | NaN | 2.170 | 0.200 | 0.200 | ... | 79.00000 | 1.470 | 0.060 | 0.060 | 1.930 | 0.1800 | 0.1800 | 1.668844 | 1.396263e+07 | 1.396263e+07 |
| 7582 | XO-7 b | 0.726 | 0.038 | 0.038 | NaN | NaN | NaN | 1.346 | 0.020 | 0.020 | ... | 1.20000 | 1.405 | 0.059 | 0.059 | 1.480 | 0.0220 | 0.0220 | 2.193748 | 1.142397e+07 | 1.142397e+07 |
[12]:
plt.figure(figsize=(5,4), dpi=130)
plt.plot(hot_jup.mass, hot_jup.radius, ".", ms=10)
plt.semilogx()
# plt.semilogy()
plt.xlabel("Mass [M$_J$]")
plt.ylabel("Radius [R$_J$]")
plt.suptitle("Hot Jupiters $mass$ vs $radius$ comparison")
plt.tight_layout()
plt.show()
Load full dataset
Some planet and star data are not shown because the full data has been trimmed, when using to_resokit=True in load function.
All data can be obtained using to_resokit=False. Moreover, if we wanted a pandas Dataframe, also use to_df=True. Nevertheless, in this case the column names will be the original from each database, instead of being mapped into more comprehensive ones.
[13]:
nasa_original = datasets.load('nasa', to_resokit=False)
nasa_original
Loaded full dataset from memory stored datasets.
[13]:
| pl_name | pl_letter | hostname | hd_name | hip_name | tic_id | gaia_id | default_flag | pl_refname | sy_refname | ... | sy_jmagerr1 | sy_jmagerr2 | sy_jmagstr | sy_hmag | sy_hmagerr1 | sy_hmagerr2 | sy_hmagstr | sy_kmag | sy_kmagerr1 | sy_kmagerr2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Kepler-6 b | b | Kepler-6 | NaN | NaN | TIC 27916356 | Gaia DR2 2086636884980514304 | 0 | <a refstr=Q1_Q12_KOI_TABLE href=https://exopla... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.021 | -0.021 | 12.001±0.021 | 11.706 | 0.019 | -0.019 | 11.706±0.019 | 11.634 | 0.019 | -0.019 |
| 1 | Kepler-6 b | b | Kepler-6 | NaN | NaN | TIC 27916356 | Gaia DR2 2086636884980514304 | 1 | <a refstr=ESTEVES_ET_AL__2015 href=https://ui.... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.021 | -0.021 | 12.001±0.021 | 11.706 | 0.019 | -0.019 | 11.706±0.019 | 11.634 | 0.019 | -0.019 |
| 2 | Kepler-6 b | b | Kepler-6 | NaN | NaN | TIC 27916356 | Gaia DR2 2086636884980514304 | 0 | <a refstr=Q1_Q17_DR24_KOI_TABLE href=https://e... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.021 | -0.021 | 12.001±0.021 | 11.706 | 0.019 | -0.019 | 11.706±0.019 | 11.634 | 0.019 | -0.019 |
| 3 | Kepler-6 b | b | Kepler-6 | NaN | NaN | TIC 27916356 | Gaia DR2 2086636884980514304 | 0 | <a refstr=EXOFOP_TESS_TOI href=https://exofop.... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.021 | -0.021 | 12.001±0.021 | 11.706 | 0.019 | -0.019 | 11.706±0.019 | 11.634 | 0.019 | -0.019 |
| 4 | Kepler-6 b | b | Kepler-6 | NaN | NaN | TIC 27916356 | Gaia DR2 2086636884980514304 | 0 | <a refstr=HOLCZER_ET_AL__2016 href=https://ui.... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.021 | -0.021 | 12.001±0.021 | 11.706 | 0.019 | -0.019 | 11.706±0.019 | 11.634 | 0.019 | -0.019 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 38495 | Kepler-33 f | f | Kepler-33 | NaN | NaN | TIC 158935283 | Gaia DR2 2127355923723254272 | 0 | <a refstr=HADDEN__AMP__LITHWICK_2016 href=http... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.020 | -0.020 | 12.871±0.020 | 12.620 | 0.021 | -0.021 | 12.620±0.021 | 12.591 | 0.022 | -0.022 |
| 38496 | Kepler-33 f | f | Kepler-33 | NaN | NaN | TIC 158935283 | Gaia DR2 2127355923723254272 | 0 | <a refstr=FULTON__AMP__PETIGURA_2018 href=http... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.020 | -0.020 | 12.871±0.020 | 12.620 | 0.021 | -0.021 | 12.620±0.021 | 12.591 | 0.022 | -0.022 |
| 38497 | Kepler-33 f | f | Kepler-33 | NaN | NaN | TIC 158935283 | Gaia DR2 2127355923723254272 | 0 | <a refstr=HOLCZER_ET_AL__2016 href=https://ui.... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.020 | -0.020 | 12.871±0.020 | 12.620 | 0.021 | -0.021 | 12.620±0.021 | 12.591 | 0.022 | -0.022 |
| 38498 | Kepler-33 f | f | Kepler-33 | NaN | NaN | TIC 158935283 | Gaia DR2 2127355923723254272 | 0 | <a refstr=BERGER_ET_AL__2018 href=https://ui.a... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.020 | -0.020 | 12.871±0.020 | 12.620 | 0.021 | -0.021 | 12.620±0.021 | 12.591 | 0.022 | -0.022 |
| 38499 | Kepler-33 f | f | Kepler-33 | NaN | NaN | TIC 158935283 | Gaia DR2 2127355923723254272 | 0 | <a refstr=Q1_Q12_KOI_TABLE href=https://exopla... | <a refstr=STASSUN_ET_AL__2019 href=https://ui.... | ... | 0.020 | -0.020 | 12.871±0.020 | 12.620 | 0.021 | -0.021 | 12.620±0.021 | 12.591 | 0.022 | -0.022 |
We see that the amount of columns is larger than before
[14]:
print("# of columns in nasa (to_resokit=True):", nasa.shape[1])
print("# of columns in nasa (to_resokit=False):", nasa_original.shape[1])
# of columns in nasa (to_resokit=True): 49
# of columns in nasa (to_resokit=False): 354
Also, we can see that mass or radius are not in the dataframe, but it has pl_massj and pl_radj instead.
[15]:
print("'mass' in nasa_original:", "mass" in nasa_original.columns)
print("'radius' in nasa_original:", "radius" in nasa_original.columns)
print("'pl_massj' in nasa_original:", "pl_massj" in nasa_original.columns)
print("'pl_radj' in nasa_original:", "pl_radj" in nasa_original.columns)
'mass' in nasa_original: False
'radius' in nasa_original: False
'pl_massj' in nasa_original: True
'pl_radj' in nasa_original: True
Default units
More information about the units can be found at:
but the basics are:
Planet:
mass: Jupiter masses
radius: Jupiter radii
semimajor-axes: au
period (time): days
Star:
mass: Solar masses
radius: Solar radii
distance: parsecs
period (time): days
Update
When loading a dataset, we can check_outdated to get info about when the dataset was updated on our local machine.
[16]:
datasets.check_outdated('both')
Checking local dataset from which='eu' source...
Last modified: 0 days ago.
Number of planets in stored dataset: 7604
Checking online dataset from eu...
Number of planets in online dataset: 7604
Last online update: 2025-08-20 00:00:00 (0 days ago)
Dataset is already up-to-date.
Checking local dataset from which='nasa' source...
Last modified: 57 days ago.
Number of planets in stored dataset: 5921
(Including only default parameters sets.)
Checking online dataset from nasa...
Number of planets in online dataset: 5983
Last online update: 2025-08-14 00:00:00 (6 days ago)
The online dataset has more rows than the stored dataset.
The dataset is outdated.
[16]:
(False, True)
To perform a whole dataset update, we can use the function download, with overwrite=True. If argument check_outd=True, another check will be performed.
[17]:
datasets.download('eu', overwrite=True)
Checking local dataset from which='eu' source...
Last modified: 0 days ago.
Number of planets in stored dataset: 7604
Checking online dataset from eu...
Number of planets in online dataset: 7604
Last online update: 2025-08-20 00:00:00 (0 days ago)
Dataset is already up-to-date.
No need to download the dataset. Set check_outd=False to really force it.
To store into a file the ‘in memory’ dataset, we can use the method to_file, or even pandas method to_csv.
[18]:
eu.to_file("my_eu.csv")
Dataset saved to my_eu.csv.
Another possibility is to use the function update. This is a wrapper for download, but making a query of only the updated rows.
NOTE: This feature is not available with EU dataset (yet).
Let’s show it with NASA dataset.
[19]:
datasets.update("nasa", overwrite=True)
Loaded full dataset from memory stored datasets.
Checking local dataset from which='nasa' source...
Last modified: 57 days ago.
Number of planets in stored dataset: 5921
(Including only default parameters sets.)
Checking online dataset from nasa...
Number of planets in online dataset: 5983
Last online update: 2025-08-14 00:00:00 (6 days ago)
The online dataset has more rows than the stored dataset.
The dataset is outdated.
Latest row update in local dataset: 2025-06-11
Querying online rows update after that date.
Executing the query...
Query executed successfully.
Amount of rows downloaded: 280
Rows in only old | both | only new: 38428 | 72 | 208
Removed files: nasa.csv from /home/egianuzzi/.resokit_data/nasa_exoplanets.zip
Written nasa.csv to /home/egianuzzi/.resokit_data/nasa_exoplanets.zip.
Updated stored index in memory.
Stored dataset in memory.
[19]:
PosixPath('/home/egianuzzi/.resokit_data/nasa_exoplanets.zip/nasa.csv')
If an update is requested and check_outdis True, then it will not download new rows if not needed to.
[20]:
datasets.update("nasa", overwrite=True)
Loaded full dataset from memory stored datasets.
Checking local dataset from which='nasa' source...
Last modified: 0 days ago.
Number of planets in stored dataset: 5998
(Including only default parameters sets.)
Checking online dataset from nasa...
Number of planets in online dataset: 5983
Last online update: 2025-08-14 00:00:00 (6 days ago)
The online dataset has less rows than the stored dataset.
This could be the result of some online row(s) deleted.
Although this is usually not a problem, running
`resokit.datasets.download(which='nasa')` could solve it if needed.
No need to download the dataset. Set check_outd=False to really force it.
[ ]: