GNSS Observations Tutorial

GNSS Observations Tutorial#

In this tutorial we will explore EarthScope’s new API for fetching GNSS observations (obs) and how the SDK helps facilitate easy (and fast) access:

no more parsing RINEX
only download what you want
easy interopability with common data science libraries via Apache Arrow

import datetime as dt

import polars as pl

from earthscope_sdk import AsyncEarthScopeClient

es = AsyncEarthScopeClient()

Use the client to retrieve GNSS observations#

Use es.data.gnss_observations(...) to fetch GNSS observations from api.earthscope.org.

While the endpoint in api.earthscope.org has restrictions (e.g. only one session at a time, 48hr max time range, etc.), this python method transparently fans out requests to satisfy your entire query.

The results come back as a single PyArrow Table.

Note

Learn more about Query Plans
Learn more about working with Apache Arrow

# Request 30 hours of data (1 day + 3 hour arcs on either side)
table = await es.data.gnss_observations(
    start_datetime=dt.datetime(2025, 7, 20, 21),
    end_datetime=dt.datetime(2025, 7, 21, 3),
    station_name="AC60",
    session_name="A",
).fetch()

Convert Arrow to DataFrame#

Choose your favorite dataframe library to interact with the results.

In this tutorial, we use Polars. Polars allows us to work with the returned Arrow table in a zero-copy (very efficient!) manner.

In the resulting dataframe, we can see all of the columns returned by default. These are the same data you’re used to working with in RINEX.

df = pl.from_arrow(table).sort("timestamp")
df

shape: (349_571, 11)

timestamp	satellite	obs_code	range	phase	snr	slip	flags	fcn	system	igs
datetime[ms, UTC]	u8	str	f64	f64	f32	u16	u16	i8	str	str
2025-07-20 21:00:00 UTC	1	"1C"	2.0777e7	1.0919e8	48.0	null	null	0	"G"	"AC6000USA"
2025-07-20 21:00:00 UTC	1	"1L"	2.0777e7	1.0919e8	48.5	null	null	0	"G"	"AC6000USA"
2025-07-20 21:00:00 UTC	1	"1W"	2.0777e7	null	42.0	null	null	0	"G"	"AC6000USA"
2025-07-20 21:00:00 UTC	1	"2L"	2.0777e7	8.5080e7	48.75	null	null	0	"G"	"AC6000USA"
2025-07-20 21:00:00 UTC	1	"2W"	2.0777e7	8.5080e7	42.0	null	null	0	"G"	"AC6000USA"
…	…	…	…	…	…	…	…	…	…	…
2025-07-21 02:59:45 UTC	7	"1C"	3.9943e7	2.0990e8	32.25	null	null	0	"J"	"AC6000USA"
2025-07-21 02:59:45 UTC	7	"1L"	3.9943e7	2.0990e8	32.0	null	null	0	"J"	"AC6000USA"
2025-07-21 02:59:45 UTC	7	"1Z"	3.9943e7	2.0990e8	39.0	null	null	0	"J"	"AC6000USA"
2025-07-21 02:59:45 UTC	7	"2L"	3.9943e7	1.6356e8	36.75	null	null	0	"J"	"AC6000USA"
2025-07-21 02:59:45 UTC	7	"5Q"	3.9943e7	1.5674e8	33.75	null	null	0	"J"	"AC6000USA"

Slicing and Dicing#

This new API facilitates efficiently selecting subsets (slicing) of EarthScope’s dataset so that only the data you want is returned.

You may have noticed in the query above that we were not restricted to 1 day “chunks” like RINEX files. Instead, we asked for an arbitrary 30 hours of data.

In the following example, we’ll look at 2 months of just SNR for only one obs code from one satellite. Only the requested data will be downloaded. This is significantly more efficient than downloading RINEX which contains all fields for all systems, for all satellites, for all observations.

Note that in the resulting dataframe we only have snr - no phase, range, etc. (but of course we can request those fields too if we want!)

table = await es.data.gnss_observations(
    start_datetime=dt.datetime(2025, 7, 20),
    end_datetime=dt.datetime(2025, 9, 20),
    station_name="AC60",
    session_name="A",
    system="G",
    obs_code="1C",
    satellite="7",
    field="snr",
).fetch()
df = pl.from_arrow(table).sort("timestamp")
df

shape: (98_311, 6)

timestamp	satellite	obs_code	snr	system	igs
datetime[ms, UTC]	u8	str	f32	str	str
2025-07-20 01:13:15 UTC	7	"1C"	29.5	"G"	"AC6000USA"
2025-07-20 01:13:30 UTC	7	"1C"	26.0	"G"	"AC6000USA"
2025-07-20 01:14:15 UTC	7	"1C"	33.25	"G"	"AC6000USA"
2025-07-20 01:14:30 UTC	7	"1C"	30.25	"G"	"AC6000USA"
2025-07-20 01:14:45 UTC	7	"1C"	32.0	"G"	"AC6000USA"
…	…	…	…	…	…
2025-08-31 23:58:45 UTC	7	"1C"	46.5	"G"	"AC6000USA"
2025-08-31 23:59:00 UTC	7	"1C"	46.25	"G"	"AC6000USA"
2025-08-31 23:59:15 UTC	7	"1C"	46.5	"G"	"AC6000USA"
2025-08-31 23:59:30 UTC	7	"1C"	46.25	"G"	"AC6000USA"
2025-08-31 23:59:45 UTC	7	"1C"	46.25	"G"	"AC6000USA"

Visualization#

Now let’s make a plot of the mean SNR in 5 minute time bins for the entire result set.

agg_df = (
    df.with_columns(
        [
            # Create a new column with the time truncated to the desired time resolution
            pl.col("timestamp").dt.truncate("5m").alias("time_bin"),
            # Create a new composite column with the system, satellite, and obs code concatenated
            pl.concat_str(["system", "satellite", "obs_code"], separator="-").alias(
                "system_sat_obs"
            ),
        ]
    )
    # Group by the new time bin and system/sat/obs composite column
    .group_by(["time_bin", "system_sat_obs"])
    # Aggregate the SNR data by taking the mean
    .agg(pl.col("snr").mean().alias("mean_snr"))
    .sort("time_bin")
)
agg_df

shape: (5_005, 3)

time_bin	system_sat_obs	mean_snr
datetime[ms, UTC]	str	f32
2025-07-20 01:10:00 UTC	"G-7-1C"	30.200001
2025-07-20 01:15:00 UTC	"G-7-1C"	33.325001
2025-07-20 01:20:00 UTC	"G-7-1C"	34.299999
2025-07-20 01:25:00 UTC	"G-7-1C"	34.900002
2025-07-20 01:30:00 UTC	"G-7-1C"	36.075001
…	…	…
2025-08-31 23:35:00 UTC	"G-7-1C"	44.412498
2025-08-31 23:40:00 UTC	"G-7-1C"	44.724998
2025-08-31 23:45:00 UTC	"G-7-1C"	45.450001
2025-08-31 23:50:00 UTC	"G-7-1C"	45.700001
2025-08-31 23:55:00 UTC	"G-7-1C"	46.212502

Efficient Visualization#

Polars works with Altair for visualization. Altair is much faster when using the Rust library vegafusion as a backend.

# Enable rust backend for altair
import altair as alt

_ = alt.data_transformers.enable("vegafusion")

Plotting satellite arcs#

The Polars dataframe has a useful .plot attribute for creating different types of plots. Below, we create a scatter plot using .plot.point(...)

agg_df.plot.point(
    x="time_bin",
    y="mean_snr",
    color="system_sat_obs",
).properties(width=800, height=300)

Putting it all together#

Now let’s grab a some data across multiple stations for multiple satellites and codes.

# Fetch a week of data
start = dt.datetime(2025, 7, 20)
end = start + dt.timedelta(days=7)

table = await es.data.gnss_observations(
    start_datetime=start,
    end_datetime=end,
    station_name=["P717", "P453", "P146", "P147", "P041"],
    session_name="A",
    system=["G", "R"],
    obs_code=["1C", "2L"],
    satellite=["7", "21", "28"],
    field=["snr", "range"],
).fetch()

# Arrow table to Polars dataframe
df = pl.from_arrow(table)

# Aggregation
agg_df = (
    df.with_columns(
        [
            pl.col("timestamp").dt.truncate("1m").alias("time_bin"),
            pl.concat_str(["system", "satellite", "obs_code"], separator="-").alias(
                "system_sat_obs"
            ),
        ]
    )
    .group_by(["time_bin", "system_sat_obs"])
    .agg(
        pl.col("snr").mean().alias("mean_snr"),
        pl.col("range").max().alias("max_range"),
    )
    .sort("time_bin")
)

# plotting
agg_df.plot.point(
    x="time_bin",
    y="mean_snr",
    color="system_sat_obs",
).properties(width=800, height=300)

# plotting
agg_df.plot.point(
    x="time_bin",
    y="max_range",
    color="system_sat_obs",
).properties(width=800, height=300)

Geometry Free#

The Geometry-free combination cancels the geometric part of the measurement, leaving all the frequency-dependent effects (i.e., ionospheric refraction, instrumental delays, wind-up) besides multipath and measurement noise. It can be used to estimate the ionospheric electron content, to detect cycle-slips in the carrier phase, or also to estimate antenna rotations as well.

More info at Navipedia

# Fetch a single arc of data
start = dt.datetime(2025, 7, 19, 22)
end = start + dt.timedelta(hours=18)

table = await es.data.gnss_observations(
    start_datetime=start,
    end_datetime=end,
    station_name="AC60",
    session_name="A",
    system=["G"],
    obs_code=["1C", "2W"],
    satellite=["7"],
    field=["phase", "range"],
).fetch()

# Arrow table to Polars dataframe
df = pl.from_arrow(table).sort("timestamp")
df

shape: (2_788, 7)

timestamp	satellite	obs_code	range	phase	system	igs
datetime[ms, UTC]	u8	str	f64	f64	str	str
2025-07-20 01:13:15 UTC	7	"1C"	2.5723e7	null	"G"	"AC6000USA"
2025-07-20 01:13:30 UTC	7	"1C"	2.5714e7	null	"G"	"AC6000USA"
2025-07-20 01:14:15 UTC	7	"1C"	2.5687e7	null	"G"	"AC6000USA"
2025-07-20 01:14:30 UTC	7	"1C"	2.5677e7	1.3494e8	"G"	"AC6000USA"
2025-07-20 01:14:45 UTC	7	"1C"	2.5668e7	1.3489e8	"G"	"AC6000USA"
…	…	…	…	…	…	…
2025-07-20 07:02:00 UTC	7	"2W"	2.5648e7	1.0502e8	"G"	"AC6000USA"
2025-07-20 07:02:30 UTC	7	"1C"	2.5665e7	1.3487e8	"G"	"AC6000USA"
2025-07-20 07:02:30 UTC	7	"2W"	2.5665e7	1.0510e8	"G"	"AC6000USA"
2025-07-20 07:02:45 UTC	7	"1C"	2.5674e7	1.3492e8	"G"	"AC6000USA"
2025-07-20 07:02:45 UTC	7	"2W"	2.5674e7	1.0513e8	"G"	"AC6000USA"

# Reshape dataframe to enable diffing between 1C and 2W observations
# Pivot the obs_code column so that 1C and 2W become separate columns
df_pivoted = df.pivot(
    values=["range", "phase"],  # Include both range and phase for comparison
    index=["igs", "timestamp", "satellite", "system"],
    on="obs_code",
)

freq1 = 1575.42e6
freq2 = 1227.60e6
c = 299792458
lambda1 = c / freq1  # wavelength of 1C
lambda2 = c / freq2  # wavelength of 2W

# Now you can easily calculate the difference between 1C and 2W observations
gf_df = df_pivoted.with_columns(
    [
        (pl.col("range_2W") - pl.col("range_1C")).alias("gf_range"),
        ((pl.col("phase_1C") * lambda1) - (pl.col("phase_2W") * lambda2)).alias(
            "gf_phase"
        ),
    ]
)
gf_df

shape: (1_396, 10)

igs	timestamp	satellite	system	range_1C	range_2W	phase_1C	phase_2W	gf_range	gf_phase
str	datetime[ms, UTC]	u8	str	f64	f64	f64	f64	f64	f64
"AC6000USA"	2025-07-20 01:13:15 UTC	7	"G"	2.5723e7	null	null	null	null	null
"AC6000USA"	2025-07-20 01:13:30 UTC	7	"G"	2.5714e7	null	null	null	null	null
"AC6000USA"	2025-07-20 01:14:15 UTC	7	"G"	2.5687e7	null	null	null	null	null
"AC6000USA"	2025-07-20 01:14:30 UTC	7	"G"	2.5677e7	null	1.3494e8	null	null	null
"AC6000USA"	2025-07-20 01:14:45 UTC	7	"G"	2.5668e7	2.5668e7	1.3489e8	1.0511e8	7.695	-6.230897
…	…	…	…	…	…	…	…	…	…
"AC6000USA"	2025-07-20 07:01:30 UTC	7	"G"	2.5630e7	2.5630e7	1.3468e8	1.0495e8	1.909	-11.247463
"AC6000USA"	2025-07-20 07:01:45 UTC	7	"G"	2.5639e7	2.5639e7	1.3473e8	1.0499e8	2.074	-11.249344
"AC6000USA"	2025-07-20 07:02:00 UTC	7	"G"	2.5648e7	2.5648e7	1.3478e8	1.0502e8	2.66	-11.258719
"AC6000USA"	2025-07-20 07:02:30 UTC	7	"G"	2.5665e7	2.5665e7	1.3487e8	1.0510e8	2.821	-11.268065
"AC6000USA"	2025-07-20 07:02:45 UTC	7	"G"	2.5674e7	2.5674e7	1.3492e8	1.0513e8	1.636	-11.277862

# Convert to long format for easier plotting
gf_df_long = gf_df.unpivot(
    ["gf_range", "gf_phase"],
    index=["igs", "timestamp", "satellite", "system"],
)
gf_df_long

# Plotting
gf_df_long.plot.point(x="timestamp", y="value", color="variable").properties(
    width=800, height=300
)

Larger than memory requests#

Sometimes the amount of data you want to work with is larger than what fits in your machine’s memory. For such a scenario, you can use the query plan to easily iterate over subsets of the overall result set.

Note

Take a look at memory efficient access using Query Plans for more details.

In the following example, I want to process an entire network one day at a time.

# Create a query plan to fetch a week of data
start = dt.datetime(2025, 7, 20)
end = start + dt.timedelta(days=7)

plan = es.data.gnss_observations(
    start_datetime=start,
    end_datetime=end,
    network_name="PERM:Alaska",
    session_name="A",
    system="G",
    field=["phase", "range", "snr"],
)


# Helper function to print information about a table
def print_table_info(table):
    df = pl.from_arrow(table)
    ux_igs = list(df["igs"].unique().sort())
    if len(ux_igs) > 10:
        ux_igs = len(ux_igs)
    min_time = df["timestamp"].min()
    max_time = df["timestamp"].max()
    count = len(df)
    print(count, min_time, max_time, ux_igs)

Request groups effectively limit the concurrency of requests made to api.earthscope.org as well as limit how much data is loaded into memory at once.

In the following example, we create request groups for each day. All requests in a single group are executed in parallel and collected into a single Table. You can then do whatever you want with that table before proceeding to the next iteration of the loop where another request group is used to create the next table.

async for table in plan.group_by_day():
    print_table_info(table)

1425616 2025-07-20 00:00:00+00:00 2025-07-20 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1423432 2025-07-21 00:00:00+00:00 2025-07-21 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1426148 2025-07-22 00:00:00+00:00 2025-07-22 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1425577 2025-07-23 00:00:00+00:00 2025-07-23 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1425262 2025-07-24 00:00:00+00:00 2025-07-24 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1424973 2025-07-25 00:00:00+00:00 2025-07-25 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1423799 2025-07-26 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']

We can just as easily process the result set transposed: the entire time window but one station at a time:

async for table in plan.sort_by_station().group_by_station():
    print_table_info(table)

2097697 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['ATW200USA']
1903112 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['CGNN00USA']
1979093 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['CLGO00USA']
2195651 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['GRNX00USA']
1799254 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['SELD00USA']