Resokit.datasets Tutorial

This tutorial is a guide to use ResoKit.datasets package. ResoKit.datasets provides tools for loading the available datasets:

In this tutorial we show, step by step, how to get started with the ResoKit.datasets package.


Import ResoKit and the necessary packages

[1]:
import resokit.datasets as datasets

import matplotlib.pyplot as plt

Load the dataset

Let’s start with EU catalogue.

First, download the dataset (if not downloaded yet)

[2]:
datasets.download('eu', overwrite=False, soft=True)
Checking local dataset from which='eu' source...
 File from which='eu' source to check if outdated not found.
Downloading data from https://exoplanet.eu/catalog/csv/...
 Note: Progress is shown at every 0.15 MB
 Data downloaded successfully. (3.39 MB)
Creating the ZIP archive /home/egianuzzi/.resokit_data/exoplanet_eu.zip...
 Written exoplanet_eu.csv to /home/egianuzzi/.resokit_data/exoplanet_eu.zip.
Updated stored index in memory.
Stored dataset in memory.
[2]:
PosixPath('/home/egianuzzi/.resokit_data/exoplanet_eu.zip/exoplanet_eu.csv')

Now, load it.

[3]:
datasets.load('eu')
 Loaded full dataset from memory stored datasets.
[3]:
name mass mass_err_min mass_err_max mass_sin_i mass_sin_i_err_min mass_sin_i_err_max radius radius_err_min radius_err_max ... star_dist_err_max star_mass star_mass_err_min star_mass_err_max star_radius star_radius_err_min star_radius_err_max n n_err_min n_err_max
0 109 Psc b 5.743 0.28900 1.01100 6.3830 0.07800 0.07800 1.152 NaN NaN ... 0.88000 1.13 0.030 0.030 1.79000 0.1700 0.1700 0.005843 7.853982e+00 8.975979e+00
1 112 Psc b NaN 0.00500 0.00400 0.0330 0.00500 0.00400 NaN NaN NaN ... 0.10695 1.10 0.133 0.133 1.80100 0.0725 0.0725 1.427997 3.141593e+04 1.570796e+04
2 112 Psc c 9.866 1.78100 3.19000 NaN NaN NaN NaN NaN NaN ... 0.10695 1.10 0.133 0.133 1.80100 0.0725 0.0725 0.000173 6.460195e-04 1.040366e-03
3 11 Com Ab NaN 1.53491 1.53491 16.1284 1.53491 1.53491 NaN NaN NaN ... 10.50000 2.70 0.300 0.300 19.00000 2.0000 2.0000 0.019272 1.963495e+01 1.963495e+01
4 11 UMi b NaN 1.10000 1.10000 11.0873 1.10000 1.10000 NaN NaN NaN ... 6.90000 1.80 0.250 0.250 24.08000 1.8400 1.8400 0.012172 1.933288e+00 1.933288e+00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7599 ZTF J1406+1222 Ab 50.000 NaN NaN NaN NaN NaN 0.292 NaN NaN ... 200.00000 1.40 NaN NaN 0.00002 NaN NaN 116.233408 NaN NaN
7600 ZTF J1622+47 b 61.000 19.00000 19.00000 NaN NaN NaN 0.980 0.02 0.02 ... NaN 0.47 NaN NaN 0.18200 0.0040 0.0040 90.031428 NaN NaN
7601 ZTF J1637+49 b 23.000 8.00000 8.00000 NaN NaN NaN 0.680 0.07 0.07 ... 8.00000 0.90 0.050 0.050 0.00900 0.0010 0.0010 146.120589 NaN NaN
7602 ZTF J1828+2308 b 19.500 0.80000 0.80000 NaN NaN NaN 1.020 0.02 0.02 ... 5.00000 0.61 0.040 0.040 0.01310 0.0002 0.0002 56.096513 7.222052e+06 7.222052e+06
7603 ZTF J2252-05 b 26.000 8.00000 8.00000 NaN NaN NaN 0.490 0.04 0.04 ... 82.00000 0.76 0.050 0.050 0.01000 0.0010 0.0010 261.799388 NaN NaN
Full ResokitDataSet - 7604 rows x 45 columns

Now, the dataset is stored into memory, so it can be accessed faster than reading the file it again.

[4]:
eu = datasets.load('eu')
eu
 Loaded full dataset from memory stored datasets.
[4]:
name mass mass_err_min mass_err_max mass_sin_i mass_sin_i_err_min mass_sin_i_err_max radius radius_err_min radius_err_max ... star_dist_err_max star_mass star_mass_err_min star_mass_err_max star_radius star_radius_err_min star_radius_err_max n n_err_min n_err_max
0 109 Psc b 5.743 0.28900 1.01100 6.3830 0.07800 0.07800 1.152 NaN NaN ... 0.88000 1.13 0.030 0.030 1.79000 0.1700 0.1700 0.005843 7.853982e+00 8.975979e+00
1 112 Psc b NaN 0.00500 0.00400 0.0330 0.00500 0.00400 NaN NaN NaN ... 0.10695 1.10 0.133 0.133 1.80100 0.0725 0.0725 1.427997 3.141593e+04 1.570796e+04
2 112 Psc c 9.866 1.78100 3.19000 NaN NaN NaN NaN NaN NaN ... 0.10695 1.10 0.133 0.133 1.80100 0.0725 0.0725 0.000173 6.460195e-04 1.040366e-03
3 11 Com Ab NaN 1.53491 1.53491 16.1284 1.53491 1.53491 NaN NaN NaN ... 10.50000 2.70 0.300 0.300 19.00000 2.0000 2.0000 0.019272 1.963495e+01 1.963495e+01
4 11 UMi b NaN 1.10000 1.10000 11.0873 1.10000 1.10000 NaN NaN NaN ... 6.90000 1.80 0.250 0.250 24.08000 1.8400 1.8400 0.012172 1.933288e+00 1.933288e+00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7599 ZTF J1406+1222 Ab 50.000 NaN NaN NaN NaN NaN 0.292 NaN NaN ... 200.00000 1.40 NaN NaN 0.00002 NaN NaN 116.233408 NaN NaN
7600 ZTF J1622+47 b 61.000 19.00000 19.00000 NaN NaN NaN 0.980 0.02 0.02 ... NaN 0.47 NaN NaN 0.18200 0.0040 0.0040 90.031428 NaN NaN
7601 ZTF J1637+49 b 23.000 8.00000 8.00000 NaN NaN NaN 0.680 0.07 0.07 ... 8.00000 0.90 0.050 0.050 0.00900 0.0010 0.0010 146.120589 NaN NaN
7602 ZTF J1828+2308 b 19.500 0.80000 0.80000 NaN NaN NaN 1.020 0.02 0.02 ... 5.00000 0.61 0.040 0.040 0.01310 0.0002 0.0002 56.096513 7.222052e+06 7.222052e+06
7603 ZTF J2252-05 b 26.000 8.00000 8.00000 NaN NaN NaN 0.490 0.04 0.04 ... 82.00000 0.76 0.050 0.050 0.01000 0.0010 0.0010 261.799388 NaN NaN
Full ResokitDataSet - 7604 rows x 45 columns

Let’s load NASA dataset too.

[5]:
datasets.download('nasa', overwrite=False, soft=True)
nasa_full = datasets.load("nasa")
nasa_full
Zip file /home/egianuzzi/.resokit_data/nasa_exoplanets.zip already exists. Set overwrite=True to force the download.
 Loading the entire dataset...
  Reading nasa.csv directly from /home/egianuzzi/.resokit_data/nasa_exoplanets.zip...
Updated stored index in memory.
Stored dataset in memory.
[5]:
name star_name default_set reference disc_year disc_method P P_err_min P_err_max w_err_min ... star_radius_err_min star_radius_err_max n_stars n_planets star_dist star_dist_err_min star_dist_err_max n n_err_min n_err_max
0 Kepler-6 b Kepler-6 0 <a refstr=Q1_Q12_KOI_TABLE href=https://exopla... 2009 Transit 3.234699 1.130000e-07 1.130000e-07 NaN ... 0.093000 0.099000 1 1 587.039 5.071 4.986 1.942433 5.560341e+07 5.560341e+07
1 Kepler-6 b Kepler-6 1 <a refstr=ESTEVES_ET_AL__2015 href=https://ui.... 2009 Transit 3.234700 4.000000e-07 4.000000e-07 NaN ... 0.017000 0.034000 1 1 587.039 5.071 4.986 1.942432 1.570796e+07 1.570796e+07
2 Kepler-6 b Kepler-6 0 <a refstr=Q1_Q17_DR24_KOI_TABLE href=https://e... 2009 Transit 3.234699 1.130000e-07 1.130000e-07 NaN ... 0.093000 0.099000 1 1 587.039 5.071 4.986 1.942433 5.560341e+07 5.560341e+07
3 Kepler-6 b Kepler-6 0 <a refstr=EXOFOP_TESS_TOI href=https://exofop.... 2009 Transit 3.234694 6.012800e-06 6.012800e-06 NaN ... 0.059593 0.059593 1 1 587.039 5.071 4.986 1.942436 1.044968e+06 1.044968e+06
4 Kepler-6 b Kepler-6 0 <a refstr=HOLCZER_ET_AL__2016 href=https://ui.... 2009 Transit 3.234699 3.000000e-08 3.000000e-08 NaN ... 0.072914 0.046272 1 1 587.039 5.071 4.986 1.942433 2.094395e+08 2.094395e+08
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
38495 Kepler-33 f Kepler-33 0 <a refstr=HADDEN__AMP__LITHWICK_2016 href=http... 2011 Transit 41.029000 NaN NaN NaN ... 0.260000 0.420000 1 5 1209.160 22.195 22.195 0.153140 NaN NaN
38496 Kepler-33 f Kepler-33 0 <a refstr=FULTON__AMP__PETIGURA_2018 href=http... 2011 Transit 41.028064 1.385000e-04 1.385000e-04 NaN ... 0.047000 0.045000 1 5 1209.160 22.195 22.195 0.153144 4.536596e+04 4.536596e+04
38497 Kepler-33 f Kepler-33 0 <a refstr=HOLCZER_ET_AL__2016 href=https://ui.... 2011 Transit 41.028139 9.930000e-06 9.930000e-06 NaN ... NaN NaN 1 5 1209.160 22.195 22.195 0.153143 6.327478e+05 6.327478e+05
38498 Kepler-33 f Kepler-33 0 <a refstr=BERGER_ET_AL__2018 href=https://ui.a... 2011 Transit NaN NaN NaN NaN ... 0.076000 0.073000 1 5 1209.160 22.195 22.195 NaN NaN NaN
38499 Kepler-33 f Kepler-33 0 <a refstr=Q1_Q12_KOI_TABLE href=https://exopla... 2011 Transit 41.028064 1.385000e-04 1.385000e-04 NaN ... 0.399000 0.381000 1 5 1209.160 22.195 22.195 0.153144 4.536596e+04 4.536596e+04
Full ResokitDataSet - 38500 rows x 49 columns

Nasa dataset contains all exoplanet solutions (including non default and controversial), so we filter them here

[6]:
nasa = nasa_full[(nasa_full.controversial == 0) & (nasa_full.default_set == 1)]
nasa
[6]:
name star_name default_set reference disc_year disc_method P P_err_min P_err_max w_err_min ... star_radius_err_min star_radius_err_max n_stars n_planets star_dist star_dist_err_min star_dist_err_max n n_err_min n_err_max
1 Kepler-6 b Kepler-6 1 <a refstr=ESTEVES_ET_AL__2015 href=https://ui.... 2009 Transit 3.234700 4.000000e-07 4.000000e-07 NaN ... 0.017 0.034 1 1 587.039 5.0710 4.9860 1.942432 1.570796e+07 1.570796e+07
25 Kepler-491 b Kepler-491 1 <a refstr=MORTON_ET_AL__2016 href=https://ui.a... 2016 Transit 4.225385 2.680000e-07 2.680000e-07 NaN ... 0.092 0.059 1 1 630.785 6.4310 6.3050 1.487009 2.344472e+07 2.344472e+07
34 Kepler-257 b Kepler-257 1 <a refstr=ROWE_ET_AL__2014 href=https://ui.ads... 2014 Transit 2.382667 6.000000e-06 6.000000e-06 NaN ... 0.518 0.518 1 3 780.256 17.5400 16.7990 2.637039 1.047198e+06 1.047198e+06
43 Kepler-216 b Kepler-216 1 <a refstr=ROWE_ET_AL__2014 href=https://ui.ads... 2014 Transit 7.693641 3.100000e-05 3.100000e-05 NaN ... 0.243 0.243 1 2 1187.470 21.4200 21.4200 0.816673 2.026834e+05 2.026834e+05
67 Kepler-32 c Kepler-32 1 <a refstr=FABRYCKY_ET_AL__2012 href=https://ui... 2011 Transit 8.752200 3.000000e-04 3.000000e-04 NaN ... 0.040 0.040 1 5 323.847 3.4025 3.4025 0.717898 2.094395e+04 2.094395e+04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
38444 Kepler-561 c Kepler-561 1 <a refstr=MORTON_ET_AL__2016 href=https://ui.a... 2016 Transit 5.350162 4.435000e-06 4.435000e-06 NaN ... 0.164 0.082 1 2 621.378 9.3350 9.3350 1.174392 1.416727e+06 1.416727e+06
38458 Kepler-215 b Kepler-215 1 <a refstr=ROWE_ET_AL__2014 href=https://ui.ads... 2014 Transit 9.360672 4.000000e-05 4.000000e-05 NaN ... 0.236 0.236 1 4 485.997 3.6540 3.6540 0.671232 1.570796e+05 1.570796e+05
38462 Kepler-62 b Kepler-62 1 <a refstr=BORUCKI_ET_AL__2013 href=https://ui.... 2013 Transit 5.714932 9.000000e-06 9.000000e-06 NaN ... 0.020 0.020 1 5 300.874 1.2190 1.2190 1.099433 6.981317e+05 6.981317e+05
38477 Kepler-192 c Kepler-192 1 <a refstr=ROWE_ET_AL__2014 href=https://ui.ads... 2014 Transit 21.223400 7.600000e-05 7.600000e-05 NaN ... 0.164 0.164 1 3 651.674 6.7435 6.7435 0.296050 8.267349e+04 8.267349e+04
38486 Kepler-33 f Kepler-33 1 <a refstr=LISSAUER_ET_AL__2012 href=https://ui... 2011 Transit 41.029020 4.200000e-04 4.200000e-04 NaN ... 0.140 0.180 1 5 1209.160 22.1950 22.1950 0.153140 1.495997e+04 1.495997e+04
Partial ResokitDataSet - 5887 rows x 49 columns

We can see both datasets have different amount of exoplanets:

[7]:
print(f"# of planets in NASA dataset: {nasa.shape[0]}")
print(f"# of planets in EU dataset: {eu.shape[0]}")
# of planets in NASA dataset: 5887
# of planets in EU dataset: 7604

Inspection

Checking the columns, we find that both data frames have almost all equal column names

[8]:
print(f"{'Column':^19s} || {'NASA':^4s} | {'EU':^4s}")
print("-" * 50)
for col in nasa.columns.sort_values():
    print(f"{col:>19s} || {'YES':^4s} | {'YES' if col in eu.columns else 'NO':^4s}")
      Column        || NASA |  EU
--------------------------------------------------
                  P || YES  | YES
          P_err_max || YES  | YES
          P_err_min || YES  | YES
                  a || YES  | YES
          a_err_max || YES  | YES
          a_err_min || YES  | YES
      controversial || YES  |  NO
        default_set || YES  |  NO
        disc_method || YES  | YES
          disc_year || YES  | YES
                  e || YES  | YES
          e_err_max || YES  | YES
          e_err_min || YES  | YES
                inc || YES  | YES
        inc_err_max || YES  | YES
        inc_err_min || YES  | YES
               mass || YES  | YES
       mass_err_max || YES  | YES
       mass_err_min || YES  | YES
         mass_sin_i || YES  | YES
 mass_sin_i_err_max || YES  | YES
 mass_sin_i_err_min || YES  | YES
                  n || YES  | YES
          n_err_max || YES  | YES
          n_err_min || YES  | YES
          n_planets || YES  |  NO
            n_stars || YES  |  NO
               name || YES  | YES
             radius || YES  | YES
     radius_err_max || YES  | YES
     radius_err_min || YES  | YES
          reference || YES  | YES
          rowupdate || YES  | YES
          star_dist || YES  | YES
  star_dist_err_max || YES  | YES
  star_dist_err_min || YES  | YES
          star_mass || YES  | YES
  star_mass_err_max || YES  | YES
  star_mass_err_min || YES  | YES
          star_name || YES  | YES
        star_radius || YES  | YES
star_radius_err_max || YES  | YES
star_radius_err_min || YES  | YES
              tperi || YES  | YES
      tperi_err_max || YES  | YES
      tperi_err_min || YES  | YES
                  w || YES  | YES
          w_err_max || YES  | YES
          w_err_min || YES  | YES

The only differences are:

  • controversial : The exoplanet is confirmed (0) or not yet (1).

  • default_set : The solution is the default set (1) or not (0).

  • n_planets : Amount of planets in that system.

  • n_stars : Amount of stars in that system.

We can see both datasets are similar, but not equal.

[9]:
plt.figure(figsize=(5,4), dpi=130)
plt.plot(eu.mass, eu.radius, ".", ms=10, label="EU")
plt.plot(nasa.mass, nasa.radius, "x", ms=3, label="NASA")
plt.semilogx()
plt.semilogy()
plt.xlabel("Planet mass [M$_\\mathrm{J}$]")
plt.ylabel("Planet radius [R$_\\mathrm{J}$]")
plt.suptitle("Mass vs Radius comparison")
plt.xlim(1e-4, 1e2)
plt.ylim(bottom=1e-2)
plt.legend()
plt.tight_layout()
plt.show()
../_images/tutorials_datasets_19_0.png
[10]:
plt.figure(figsize=(5,4), dpi=130)
plt.plot(eu.a, eu.e, ".", ms=10, label="EU")
plt.plot(nasa.a, nasa.e, "x", ms=3, label="NASA")
plt.semilogx()
plt.semilogy()
plt.xlabel("Planet semimajor-axis [AU]")
plt.ylabel("Planet eccentricity")
plt.suptitle("$a$ vs $e$ comparison")
plt.ylim(bottom=1e-5)
plt.legend()
plt.tight_layout()
plt.show()
../_images/tutorials_datasets_20_0.png

Slicing

In case we need just an specific group of systems / exoplanets, we can access the ResokitDataSet as is were a pandas DataFrame; and slice it.

For example, if we want to gete just the Hot Jupiter like exoplanets (\(m \in [0.36,\, 11.8]\,M_J\) and \(P \in [1.3,\, 111]\,days\)), we simply do:

[11]:
hot_jup = eu[(0.36 < eu.mass) & (eu.mass < 11.8) & (1.3 < eu.P) & (eu.P < 111)]
hot_jup
[11]:
name mass mass_err_min mass_err_max mass_sin_i mass_sin_i_err_min mass_sin_i_err_max radius radius_err_min radius_err_max ... star_dist_err_max star_mass star_mass_err_min star_mass_err_max star_radius star_radius_err_min star_radius_err_max n n_err_min n_err_max
314 51 Peg b 0.470 0.070 0.030 0.460 0.060 0.010 1.900 0.300 0.300 ... 0.02970 1.030 0.126 0.126 1.176 0.0510 0.0510 1.485106 1.684929e+05 1.684929e+05
318 55 Cnc Ab 0.840 0.031 0.230 0.840 0.031 0.131 NaN NaN NaN ... 0.01235 0.900 0.115 0.115 0.963 0.0654 0.0654 0.428794 6.981317e+03 6.613879e+03
488 CoRoT-10 b 2.750 0.140 0.140 NaN NaN NaN 0.970 0.050 0.050 ... 50.00000 0.890 0.050 0.050 0.790 0.0500 0.0500 0.474539 3.141593e+04 3.141593e+04
489 CoRoT-11 b 2.330 0.270 0.270 NaN NaN NaN 1.430 0.033 0.033 ... 30.00000 1.270 0.050 0.050 1.360 0.1300 0.1300 2.098365 2.991993e+05 2.991993e+05
490 CoRoT-12 b 0.917 0.065 0.070 NaN NaN NaN 1.440 0.130 0.130 ... 85.00000 1.078 0.072 0.072 1.116 0.0920 0.0920 2.221736 5.235988e+06 5.235988e+06
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7578 XO-3 Ab 11.790 0.590 0.590 NaN NaN NaN 1.217 0.073 0.073 ... 23.00000 1.410 0.080 0.080 1.490 0.0800 0.0800 1.968710 2.731820e+04 2.731820e+04
7579 XO-4 b 1.616 0.100 0.101 1.615 0.100 0.100 1.317 0.029 0.040 ... 19.00000 1.320 0.020 0.020 1.550 0.0500 0.0500 1.523296 1.030030e+04 1.336848e+04
7580 XO-5 b 1.077 0.037 0.037 NaN NaN NaN 1.030 0.050 0.050 ... 13.00000 0.880 0.030 0.030 1.060 0.0500 0.0500 1.500371 3.695991e+06 3.695991e+06
7581 XO-6 b 4.470 0.120 0.120 NaN NaN NaN 2.170 0.200 0.200 ... 79.00000 1.470 0.060 0.060 1.930 0.1800 0.1800 1.668844 1.396263e+07 1.396263e+07
7582 XO-7 b 0.726 0.038 0.038 NaN NaN NaN 1.346 0.020 0.020 ... 1.20000 1.405 0.059 0.059 1.480 0.0220 0.0220 2.193748 1.142397e+07 1.142397e+07
Partial ResokitDataSet - 678 rows x 45 columns
[12]:
plt.figure(figsize=(5,4), dpi=130)
plt.plot(hot_jup.mass, hot_jup.radius, ".", ms=10)
plt.semilogx()
# plt.semilogy()
plt.xlabel("Mass [M$_J$]")
plt.ylabel("Radius [R$_J$]")
plt.suptitle("Hot Jupiters $mass$ vs $radius$ comparison")
plt.tight_layout()
plt.show()
../_images/tutorials_datasets_23_0.png

Load full dataset

Some planet and star data are not shown because the full data has been trimmed, when using to_resokit=True in load function.

All data can be obtained using to_resokit=False. Moreover, if we wanted a pandas Dataframe, also use to_df=True. Nevertheless, in this case the column names will be the original from each database, instead of being mapped into more comprehensive ones.

[13]:
nasa_original = datasets.load('nasa', to_resokit=False)
nasa_original
 Loaded full dataset from memory stored datasets.
[13]:
pl_name pl_letter hostname hd_name hip_name tic_id gaia_id default_flag pl_refname sy_refname ... sy_jmagerr1 sy_jmagerr2 sy_jmagstr sy_hmag sy_hmagerr1 sy_hmagerr2 sy_hmagstr sy_kmag sy_kmagerr1 sy_kmagerr2
0 Kepler-6 b b Kepler-6 NaN NaN TIC 27916356 Gaia DR2 2086636884980514304 0 <a refstr=Q1_Q12_KOI_TABLE href=https://exopla... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.021 -0.021 12.001&plusmn;0.021 11.706 0.019 -0.019 11.706&plusmn;0.019 11.634 0.019 -0.019
1 Kepler-6 b b Kepler-6 NaN NaN TIC 27916356 Gaia DR2 2086636884980514304 1 <a refstr=ESTEVES_ET_AL__2015 href=https://ui.... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.021 -0.021 12.001&plusmn;0.021 11.706 0.019 -0.019 11.706&plusmn;0.019 11.634 0.019 -0.019
2 Kepler-6 b b Kepler-6 NaN NaN TIC 27916356 Gaia DR2 2086636884980514304 0 <a refstr=Q1_Q17_DR24_KOI_TABLE href=https://e... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.021 -0.021 12.001&plusmn;0.021 11.706 0.019 -0.019 11.706&plusmn;0.019 11.634 0.019 -0.019
3 Kepler-6 b b Kepler-6 NaN NaN TIC 27916356 Gaia DR2 2086636884980514304 0 <a refstr=EXOFOP_TESS_TOI href=https://exofop.... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.021 -0.021 12.001&plusmn;0.021 11.706 0.019 -0.019 11.706&plusmn;0.019 11.634 0.019 -0.019
4 Kepler-6 b b Kepler-6 NaN NaN TIC 27916356 Gaia DR2 2086636884980514304 0 <a refstr=HOLCZER_ET_AL__2016 href=https://ui.... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.021 -0.021 12.001&plusmn;0.021 11.706 0.019 -0.019 11.706&plusmn;0.019 11.634 0.019 -0.019
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
38495 Kepler-33 f f Kepler-33 NaN NaN TIC 158935283 Gaia DR2 2127355923723254272 0 <a refstr=HADDEN__AMP__LITHWICK_2016 href=http... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.020 -0.020 12.871&plusmn;0.020 12.620 0.021 -0.021 12.620&plusmn;0.021 12.591 0.022 -0.022
38496 Kepler-33 f f Kepler-33 NaN NaN TIC 158935283 Gaia DR2 2127355923723254272 0 <a refstr=FULTON__AMP__PETIGURA_2018 href=http... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.020 -0.020 12.871&plusmn;0.020 12.620 0.021 -0.021 12.620&plusmn;0.021 12.591 0.022 -0.022
38497 Kepler-33 f f Kepler-33 NaN NaN TIC 158935283 Gaia DR2 2127355923723254272 0 <a refstr=HOLCZER_ET_AL__2016 href=https://ui.... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.020 -0.020 12.871&plusmn;0.020 12.620 0.021 -0.021 12.620&plusmn;0.021 12.591 0.022 -0.022
38498 Kepler-33 f f Kepler-33 NaN NaN TIC 158935283 Gaia DR2 2127355923723254272 0 <a refstr=BERGER_ET_AL__2018 href=https://ui.a... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.020 -0.020 12.871&plusmn;0.020 12.620 0.021 -0.021 12.620&plusmn;0.021 12.591 0.022 -0.022
38499 Kepler-33 f f Kepler-33 NaN NaN TIC 158935283 Gaia DR2 2127355923723254272 0 <a refstr=Q1_Q12_KOI_TABLE href=https://exopla... <a refstr=STASSUN_ET_AL__2019 href=https://ui.... ... 0.020 -0.020 12.871&plusmn;0.020 12.620 0.021 -0.021 12.620&plusmn;0.021 12.591 0.022 -0.022
Full ResokitDataSet - 38500 rows x 354 columns

We see that the amount of columns is larger than before

[14]:
print("# of columns in nasa (to_resokit=True):", nasa.shape[1])
print("# of columns in nasa (to_resokit=False):", nasa_original.shape[1])
# of columns in nasa (to_resokit=True): 49
# of columns in nasa (to_resokit=False): 354

Also, we can see that mass or radius are not in the dataframe, but it has pl_massj and pl_radj instead.

[15]:
print("'mass' in nasa_original:", "mass" in nasa_original.columns)
print("'radius' in nasa_original:", "radius" in nasa_original.columns)

print("'pl_massj' in nasa_original:", "pl_massj" in nasa_original.columns)
print("'pl_radj' in nasa_original:", "pl_radj" in nasa_original.columns)
'mass' in nasa_original: False
'radius' in nasa_original: False
'pl_massj' in nasa_original: True
'pl_radj' in nasa_original: True

Default units

More information about the units can be found at:

but the basics are:

  • Planet:

    • mass: Jupiter masses

    • radius: Jupiter radii

    • semimajor-axes: au

    • period (time): days

  • Star:

    • mass: Solar masses

    • radius: Solar radii

    • distance: parsecs

    • period (time): days

Update

When loading a dataset, we can check_outdated to get info about when the dataset was updated on our local machine.

[16]:
datasets.check_outdated('both')
Checking local dataset from which='eu' source...
 Last modified: 0 days ago.
 Number of planets in stored dataset: 7604
Checking online dataset from eu...
 Number of planets in online dataset: 7604
 Last online update: 2025-08-20 00:00:00 (0 days ago)
Dataset is already up-to-date.

Checking local dataset from which='nasa' source...
 Last modified: 57 days ago.
 Number of planets in stored dataset: 5921
  (Including only default parameters sets.)
Checking online dataset from nasa...
 Number of planets in online dataset: 5983
 Last online update: 2025-08-14 00:00:00 (6 days ago)
The online dataset has more rows than the stored dataset.
The dataset is outdated.
[16]:
(False, True)

To perform a whole dataset update, we can use the function download, with overwrite=True. If argument check_outd=True, another check will be performed.

[17]:
datasets.download('eu', overwrite=True)
Checking local dataset from which='eu' source...
 Last modified: 0 days ago.
 Number of planets in stored dataset: 7604
Checking online dataset from eu...
 Number of planets in online dataset: 7604
 Last online update: 2025-08-20 00:00:00 (0 days ago)
Dataset is already up-to-date.
No need to download the dataset. Set check_outd=False to really force it.

To store into a file the ‘in memory’ dataset, we can use the method to_file, or even pandas method to_csv.

[18]:
eu.to_file("my_eu.csv")
Dataset saved to my_eu.csv.

Another possibility is to use the function update. This is a wrapper for download, but making a query of only the updated rows.

NOTE: This feature is not available with EU dataset (yet).

Let’s show it with NASA dataset.

[19]:
datasets.update("nasa", overwrite=True)
 Loaded full dataset from memory stored datasets.
Checking local dataset from which='nasa' source...
 Last modified: 57 days ago.
 Number of planets in stored dataset: 5921
  (Including only default parameters sets.)
Checking online dataset from nasa...
 Number of planets in online dataset: 5983
 Last online update: 2025-08-14 00:00:00 (6 days ago)
The online dataset has more rows than the stored dataset.
The dataset is outdated.
Latest row update in local dataset: 2025-06-11
Querying online rows update after that date.
Executing the query...
Query executed successfully.
Amount of rows downloaded: 280
 Rows in only old | both | only new: 38428 | 72 | 208
Removed files: nasa.csv from /home/egianuzzi/.resokit_data/nasa_exoplanets.zip
 Written nasa.csv to /home/egianuzzi/.resokit_data/nasa_exoplanets.zip.
Updated stored index in memory.
Stored dataset in memory.
[19]:
PosixPath('/home/egianuzzi/.resokit_data/nasa_exoplanets.zip/nasa.csv')

If an update is requested and check_outdis True, then it will not download new rows if not needed to.

[20]:
datasets.update("nasa", overwrite=True)
 Loaded full dataset from memory stored datasets.
Checking local dataset from which='nasa' source...
 Last modified: 0 days ago.
 Number of planets in stored dataset: 5998
  (Including only default parameters sets.)
Checking online dataset from nasa...
 Number of planets in online dataset: 5983
 Last online update: 2025-08-14 00:00:00 (6 days ago)
The online dataset has less rows than the stored dataset.
 This could be the result of some online row(s) deleted.
 Although this is usually not a problem, running
`resokit.datasets.download(which='nasa')` could solve it if needed.
No need to download the dataset. Set check_outd=False to really force it.
[ ]: