%pip install altair polars vegafusion vl-convert-python

GNSS Observations Tutorial#

In this tutorial we will explore EarthScope’s new API for fetching GNSS observations (obs) and how the SDK helps facilitate easy (and fast) access:

  • no more parsing RINEX

  • only download what you want

  • easy interopability with common data science libraries via Apache Arrow

import datetime as dt

import polars as pl

from earthscope_sdk import AsyncEarthScopeClient

es = AsyncEarthScopeClient()

Use the client to retrieve GNSS observations#

Use es.data.gnss_observations(...) to fetch GNSS observations from api.earthscope.org.

While the endpoint in api.earthscope.org has restrictions (e.g. only one session at a time, 48hr max time range, etc.), this python method transparently fans out requests to satisfy your entire query.

The results come back as a single PyArrow Table.

Note

# Request 30 hours of data (1 day + 3 hour arcs on either side)
table = await es.data.gnss_observations(
    start_datetime=dt.datetime(2025, 7, 20, 21),
    end_datetime=dt.datetime(2025, 7, 21, 3),
    station_name="AC60",
    session_name="A",
).fetch()

Convert Arrow to DataFrame#

Choose your favorite dataframe library to interact with the results.

In this tutorial, we use Polars. Polars allows us to work with the returned Arrow table in a zero-copy (very efficient!) manner.

In the resulting dataframe, we can see all of the columns returned by default. These are the same data you’re used to working with in RINEX.

df = pl.from_arrow(table).sort("timestamp")
df
shape: (349_571, 11)
timestampsatelliteobs_coderangephasesnrslipflagsfcnsystemigs
datetime[ms, UTC]u8strf64f64f32u16u16i8strstr
2025-07-20 21:00:00 UTC1"1C"2.0777e71.0919e848.0nullnull0"G""AC6000USA"
2025-07-20 21:00:00 UTC1"1L"2.0777e71.0919e848.5nullnull0"G""AC6000USA"
2025-07-20 21:00:00 UTC1"1W"2.0777e7null42.0nullnull0"G""AC6000USA"
2025-07-20 21:00:00 UTC1"2L"2.0777e78.5080e748.75nullnull0"G""AC6000USA"
2025-07-20 21:00:00 UTC1"2W"2.0777e78.5080e742.0nullnull0"G""AC6000USA"
2025-07-21 02:59:45 UTC7"1C"3.9943e72.0990e832.25nullnull0"J""AC6000USA"
2025-07-21 02:59:45 UTC7"1L"3.9943e72.0990e832.0nullnull0"J""AC6000USA"
2025-07-21 02:59:45 UTC7"1Z"3.9943e72.0990e839.0nullnull0"J""AC6000USA"
2025-07-21 02:59:45 UTC7"2L"3.9943e71.6356e836.75nullnull0"J""AC6000USA"
2025-07-21 02:59:45 UTC7"5Q"3.9943e71.5674e833.75nullnull0"J""AC6000USA"

Slicing and Dicing#

This new API facilitates efficiently selecting subsets (slicing) of EarthScope’s dataset so that only the data you want is returned.

You may have noticed in the query above that we were not restricted to 1 day “chunks” like RINEX files. Instead, we asked for an arbitrary 30 hours of data.

In the following example, we’ll look at 2 months of just SNR for only one obs code from one satellite. Only the requested data will be downloaded. This is significantly more efficient than downloading RINEX which contains all fields for all systems, for all satellites, for all observations.

Note that in the resulting dataframe we only have snr - no phase, range, etc. (but of course we can request those fields too if we want!)

table = await es.data.gnss_observations(
    start_datetime=dt.datetime(2025, 7, 20),
    end_datetime=dt.datetime(2025, 9, 20),
    station_name="AC60",
    session_name="A",
    system="G",
    obs_code="1C",
    satellite="7",
    field="snr",
).fetch()
df = pl.from_arrow(table).sort("timestamp")
df
shape: (98_311, 6)
timestampsatelliteobs_codesnrsystemigs
datetime[ms, UTC]u8strf32strstr
2025-07-20 01:13:15 UTC7"1C"29.5"G""AC6000USA"
2025-07-20 01:13:30 UTC7"1C"26.0"G""AC6000USA"
2025-07-20 01:14:15 UTC7"1C"33.25"G""AC6000USA"
2025-07-20 01:14:30 UTC7"1C"30.25"G""AC6000USA"
2025-07-20 01:14:45 UTC7"1C"32.0"G""AC6000USA"
2025-08-31 23:58:45 UTC7"1C"46.5"G""AC6000USA"
2025-08-31 23:59:00 UTC7"1C"46.25"G""AC6000USA"
2025-08-31 23:59:15 UTC7"1C"46.5"G""AC6000USA"
2025-08-31 23:59:30 UTC7"1C"46.25"G""AC6000USA"
2025-08-31 23:59:45 UTC7"1C"46.25"G""AC6000USA"

Visualization#

Now let’s make a plot of the mean SNR in 5 minute time bins for the entire result set.

agg_df = (
    df.with_columns(
        [
            # Create a new column with the time truncated to the desired time resolution
            pl.col("timestamp").dt.truncate("5m").alias("time_bin"),
            # Create a new composite column with the system, satellite, and obs code concatenated
            pl.concat_str(["system", "satellite", "obs_code"], separator="-").alias(
                "system_sat_obs"
            ),
        ]
    )
    # Group by the new time bin and system/sat/obs composite column
    .group_by(["time_bin", "system_sat_obs"])
    # Aggregate the SNR data by taking the mean
    .agg(pl.col("snr").mean().alias("mean_snr"))
    .sort("time_bin")
)
agg_df
shape: (5_005, 3)
time_binsystem_sat_obsmean_snr
datetime[ms, UTC]strf32
2025-07-20 01:10:00 UTC"G-7-1C"30.200001
2025-07-20 01:15:00 UTC"G-7-1C"33.325001
2025-07-20 01:20:00 UTC"G-7-1C"34.299999
2025-07-20 01:25:00 UTC"G-7-1C"34.900002
2025-07-20 01:30:00 UTC"G-7-1C"36.075001
2025-08-31 23:35:00 UTC"G-7-1C"44.412498
2025-08-31 23:40:00 UTC"G-7-1C"44.724998
2025-08-31 23:45:00 UTC"G-7-1C"45.450001
2025-08-31 23:50:00 UTC"G-7-1C"45.700001
2025-08-31 23:55:00 UTC"G-7-1C"46.212502

Efficient Visualization#

Polars works with Altair for visualization. Altair is much faster when using the Rust library vegafusion as a backend.

# Enable rust backend for altair
import altair as alt

_ = alt.data_transformers.enable("vegafusion")

Plotting satellite arcs#

The Polars dataframe has a useful .plot attribute for creating different types of plots. Below, we create a scatter plot using .plot.point(...)

agg_df.plot.point(
    x="time_bin",
    y="mean_snr",
    color="system_sat_obs",
).properties(width=800, height=300)

Putting it all together#

Now let’s grab a some data across multiple stations for multiple satellites and codes.

# Fetch a week of data
start = dt.datetime(2025, 7, 20)
end = start + dt.timedelta(days=7)

table = await es.data.gnss_observations(
    start_datetime=start,
    end_datetime=end,
    station_name=["P717", "P453", "P146", "P147", "P041"],
    session_name="A",
    system=["G", "R"],
    obs_code=["1C", "2L"],
    satellite=["7", "21", "28"],
    field=["snr", "range"],
).fetch()

# Arrow table to Polars dataframe
df = pl.from_arrow(table)
# Aggregation
agg_df = (
    df.with_columns(
        [
            pl.col("timestamp").dt.truncate("1m").alias("time_bin"),
            pl.concat_str(["system", "satellite", "obs_code"], separator="-").alias(
                "system_sat_obs"
            ),
        ]
    )
    .group_by(["time_bin", "system_sat_obs"])
    .agg(
        pl.col("snr").mean().alias("mean_snr"),
        pl.col("range").max().alias("max_range"),
    )
    .sort("time_bin")
)
# plotting
agg_df.plot.point(
    x="time_bin",
    y="mean_snr",
    color="system_sat_obs",
).properties(width=800, height=300)
# plotting
agg_df.plot.point(
    x="time_bin",
    y="max_range",
    color="system_sat_obs",
).properties(width=800, height=300)

Geometry Free#

The Geometry-free combination cancels the geometric part of the measurement, leaving all the frequency-dependent effects (i.e., ionospheric refraction, instrumental delays, wind-up) besides multipath and measurement noise. It can be used to estimate the ionospheric electron content, to detect cycle-slips in the carrier phase, or also to estimate antenna rotations as well.

More info at Navipedia

# Fetch a single arc of data
start = dt.datetime(2025, 7, 19, 22)
end = start + dt.timedelta(hours=18)

table = await es.data.gnss_observations(
    start_datetime=start,
    end_datetime=end,
    station_name="AC60",
    session_name="A",
    system=["G"],
    obs_code=["1C", "2W"],
    satellite=["7"],
    field=["phase", "range"],
).fetch()

# Arrow table to Polars dataframe
df = pl.from_arrow(table).sort("timestamp")
df
shape: (2_788, 7)
timestampsatelliteobs_coderangephasesystemigs
datetime[ms, UTC]u8strf64f64strstr
2025-07-20 01:13:15 UTC7"1C"2.5723e7null"G""AC6000USA"
2025-07-20 01:13:30 UTC7"1C"2.5714e7null"G""AC6000USA"
2025-07-20 01:14:15 UTC7"1C"2.5687e7null"G""AC6000USA"
2025-07-20 01:14:30 UTC7"1C"2.5677e71.3494e8"G""AC6000USA"
2025-07-20 01:14:45 UTC7"1C"2.5668e71.3489e8"G""AC6000USA"
2025-07-20 07:02:00 UTC7"2W"2.5648e71.0502e8"G""AC6000USA"
2025-07-20 07:02:30 UTC7"1C"2.5665e71.3487e8"G""AC6000USA"
2025-07-20 07:02:30 UTC7"2W"2.5665e71.0510e8"G""AC6000USA"
2025-07-20 07:02:45 UTC7"1C"2.5674e71.3492e8"G""AC6000USA"
2025-07-20 07:02:45 UTC7"2W"2.5674e71.0513e8"G""AC6000USA"
# Reshape dataframe to enable diffing between 1C and 2W observations
# Pivot the obs_code column so that 1C and 2W become separate columns
df_pivoted = df.pivot(
    values=["range", "phase"],  # Include both range and phase for comparison
    index=["igs", "timestamp", "satellite", "system"],
    on="obs_code",
)

freq1 = 1575.42e6
freq2 = 1227.60e6
c = 299792458
lambda1 = c / freq1  # wavelength of 1C
lambda2 = c / freq2  # wavelength of 2W

# Now you can easily calculate the difference between 1C and 2W observations
gf_df = df_pivoted.with_columns(
    [
        (pl.col("range_2W") - pl.col("range_1C")).alias("gf_range"),
        ((pl.col("phase_1C") * lambda1) - (pl.col("phase_2W") * lambda2)).alias(
            "gf_phase"
        ),
    ]
)
gf_df
shape: (1_396, 10)
igstimestampsatellitesystemrange_1Crange_2Wphase_1Cphase_2Wgf_rangegf_phase
strdatetime[ms, UTC]u8strf64f64f64f64f64f64
"AC6000USA"2025-07-20 01:13:15 UTC7"G"2.5723e7nullnullnullnullnull
"AC6000USA"2025-07-20 01:13:30 UTC7"G"2.5714e7nullnullnullnullnull
"AC6000USA"2025-07-20 01:14:15 UTC7"G"2.5687e7nullnullnullnullnull
"AC6000USA"2025-07-20 01:14:30 UTC7"G"2.5677e7null1.3494e8nullnullnull
"AC6000USA"2025-07-20 01:14:45 UTC7"G"2.5668e72.5668e71.3489e81.0511e87.695-6.230897
"AC6000USA"2025-07-20 07:01:30 UTC7"G"2.5630e72.5630e71.3468e81.0495e81.909-11.247463
"AC6000USA"2025-07-20 07:01:45 UTC7"G"2.5639e72.5639e71.3473e81.0499e82.074-11.249344
"AC6000USA"2025-07-20 07:02:00 UTC7"G"2.5648e72.5648e71.3478e81.0502e82.66-11.258719
"AC6000USA"2025-07-20 07:02:30 UTC7"G"2.5665e72.5665e71.3487e81.0510e82.821-11.268065
"AC6000USA"2025-07-20 07:02:45 UTC7"G"2.5674e72.5674e71.3492e81.0513e81.636-11.277862
# Convert to long format for easier plotting
gf_df_long = gf_df.unpivot(
    ["gf_range", "gf_phase"],
    index=["igs", "timestamp", "satellite", "system"],
)
gf_df_long

# Plotting
gf_df_long.plot.point(x="timestamp", y="value", color="variable").properties(
    width=800, height=300
)

Larger than memory requests#

Sometimes the amount of data you want to work with is larger than what fits in your machine’s memory. For such a scenario, you can use the query plan to easily iterate over subsets of the overall result set.

Note

Take a look at memory efficient access using Query Plans for more details.

In the following example, I want to process an entire network one day at a time.

# Create a query plan to fetch a week of data
start = dt.datetime(2025, 7, 20)
end = start + dt.timedelta(days=7)

plan = es.data.gnss_observations(
    start_datetime=start,
    end_datetime=end,
    network_name="PERM:Alaska",
    session_name="A",
    system="G",
    field=["phase", "range", "snr"],
)


# Helper function to print information about a table
def print_table_info(table):
    df = pl.from_arrow(table)
    ux_igs = list(df["igs"].unique().sort())
    if len(ux_igs) > 10:
        ux_igs = len(ux_igs)
    min_time = df["timestamp"].min()
    max_time = df["timestamp"].max()
    count = len(df)
    print(count, min_time, max_time, ux_igs)

Request groups effectively limit the concurrency of requests made to api.earthscope.org as well as limit how much data is loaded into memory at once.

In the following example, we create request groups for each day. All requests in a single group are executed in parallel and collected into a single Table. You can then do whatever you want with that table before proceeding to the next iteration of the loop where another request group is used to create the next table.

async for table in plan.group_by_day():
    print_table_info(table)
1425616 2025-07-20 00:00:00+00:00 2025-07-20 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1423432 2025-07-21 00:00:00+00:00 2025-07-21 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1426148 2025-07-22 00:00:00+00:00 2025-07-22 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1425577 2025-07-23 00:00:00+00:00 2025-07-23 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1425262 2025-07-24 00:00:00+00:00 2025-07-24 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1424973 2025-07-25 00:00:00+00:00 2025-07-25 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1423799 2025-07-26 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']

We can just as easily process the result set transposed: the entire time window but one station at a time:

async for table in plan.sort_by_station().group_by_station():
    print_table_info(table)
2097697 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['ATW200USA']
1903112 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['CGNN00USA']
1979093 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['CLGO00USA']
2195651 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['GRNX00USA']
1799254 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['SELD00USA']