%pip install altair polars vegafusion vl-convert-python
GNSS Observations Tutorial#
In this tutorial we will explore EarthScope’s new API for fetching GNSS observations (obs) and how the SDK helps facilitate easy (and fast) access:
no more parsing RINEX
only download what you want
easy interopability with common data science libraries via Apache Arrow
import datetime as dt
import polars as pl
from earthscope_sdk import AsyncEarthScopeClient
es = AsyncEarthScopeClient()
Use the client to retrieve GNSS observations#
Use es.data.gnss_observations(...) to fetch GNSS observations from api.earthscope.org.
While the endpoint in api.earthscope.org has restrictions (e.g. only one session at a time, 48hr max time range, etc.), this python method transparently fans out requests to satisfy your entire query.
The results come back as a single PyArrow Table.
Note
Learn more about Query Plans
Learn more about working with Apache Arrow
# Request 30 hours of data (1 day + 3 hour arcs on either side)
table = await es.data.gnss_observations(
start_datetime=dt.datetime(2025, 7, 20, 21),
end_datetime=dt.datetime(2025, 7, 21, 3),
station_name="AC60",
session_name="A",
).fetch()
Convert Arrow to DataFrame#
Choose your favorite dataframe library to interact with the results.
In this tutorial, we use Polars. Polars allows us to work with the returned Arrow table in a zero-copy (very efficient!) manner.
In the resulting dataframe, we can see all of the columns returned by default. These are the same data you’re used to working with in RINEX.
df = pl.from_arrow(table).sort("timestamp")
df
| timestamp | satellite | obs_code | range | phase | snr | slip | flags | fcn | system | igs |
|---|---|---|---|---|---|---|---|---|---|---|
| datetime[ms, UTC] | u8 | str | f64 | f64 | f32 | u16 | u16 | i8 | str | str |
| 2025-07-20 21:00:00 UTC | 1 | "1C" | 2.0777e7 | 1.0919e8 | 48.0 | null | null | 0 | "G" | "AC6000USA" |
| 2025-07-20 21:00:00 UTC | 1 | "1L" | 2.0777e7 | 1.0919e8 | 48.5 | null | null | 0 | "G" | "AC6000USA" |
| 2025-07-20 21:00:00 UTC | 1 | "1W" | 2.0777e7 | null | 42.0 | null | null | 0 | "G" | "AC6000USA" |
| 2025-07-20 21:00:00 UTC | 1 | "2L" | 2.0777e7 | 8.5080e7 | 48.75 | null | null | 0 | "G" | "AC6000USA" |
| 2025-07-20 21:00:00 UTC | 1 | "2W" | 2.0777e7 | 8.5080e7 | 42.0 | null | null | 0 | "G" | "AC6000USA" |
| … | … | … | … | … | … | … | … | … | … | … |
| 2025-07-21 02:59:45 UTC | 7 | "1C" | 3.9943e7 | 2.0990e8 | 32.25 | null | null | 0 | "J" | "AC6000USA" |
| 2025-07-21 02:59:45 UTC | 7 | "1L" | 3.9943e7 | 2.0990e8 | 32.0 | null | null | 0 | "J" | "AC6000USA" |
| 2025-07-21 02:59:45 UTC | 7 | "1Z" | 3.9943e7 | 2.0990e8 | 39.0 | null | null | 0 | "J" | "AC6000USA" |
| 2025-07-21 02:59:45 UTC | 7 | "2L" | 3.9943e7 | 1.6356e8 | 36.75 | null | null | 0 | "J" | "AC6000USA" |
| 2025-07-21 02:59:45 UTC | 7 | "5Q" | 3.9943e7 | 1.5674e8 | 33.75 | null | null | 0 | "J" | "AC6000USA" |
Slicing and Dicing#
This new API facilitates efficiently selecting subsets (slicing) of EarthScope’s dataset so that only the data you want is returned.
You may have noticed in the query above that we were not restricted to 1 day “chunks” like RINEX files. Instead, we asked for an arbitrary 30 hours of data.
In the following example, we’ll look at 2 months of just SNR for only one obs code from one satellite. Only the requested data will be downloaded. This is significantly more efficient than downloading RINEX which contains all fields for all systems, for all satellites, for all observations.
Note that in the resulting dataframe we only have snr - no phase, range, etc. (but of course we can request those fields too if we want!)
table = await es.data.gnss_observations(
start_datetime=dt.datetime(2025, 7, 20),
end_datetime=dt.datetime(2025, 9, 20),
station_name="AC60",
session_name="A",
system="G",
obs_code="1C",
satellite="7",
field="snr",
).fetch()
df = pl.from_arrow(table).sort("timestamp")
df
| timestamp | satellite | obs_code | snr | system | igs |
|---|---|---|---|---|---|
| datetime[ms, UTC] | u8 | str | f32 | str | str |
| 2025-07-20 01:13:15 UTC | 7 | "1C" | 29.5 | "G" | "AC6000USA" |
| 2025-07-20 01:13:30 UTC | 7 | "1C" | 26.0 | "G" | "AC6000USA" |
| 2025-07-20 01:14:15 UTC | 7 | "1C" | 33.25 | "G" | "AC6000USA" |
| 2025-07-20 01:14:30 UTC | 7 | "1C" | 30.25 | "G" | "AC6000USA" |
| 2025-07-20 01:14:45 UTC | 7 | "1C" | 32.0 | "G" | "AC6000USA" |
| … | … | … | … | … | … |
| 2025-08-31 23:58:45 UTC | 7 | "1C" | 46.5 | "G" | "AC6000USA" |
| 2025-08-31 23:59:00 UTC | 7 | "1C" | 46.25 | "G" | "AC6000USA" |
| 2025-08-31 23:59:15 UTC | 7 | "1C" | 46.5 | "G" | "AC6000USA" |
| 2025-08-31 23:59:30 UTC | 7 | "1C" | 46.25 | "G" | "AC6000USA" |
| 2025-08-31 23:59:45 UTC | 7 | "1C" | 46.25 | "G" | "AC6000USA" |
Visualization#
Now let’s make a plot of the mean SNR in 5 minute time bins for the entire result set.
agg_df = (
df.with_columns(
[
# Create a new column with the time truncated to the desired time resolution
pl.col("timestamp").dt.truncate("5m").alias("time_bin"),
# Create a new composite column with the system, satellite, and obs code concatenated
pl.concat_str(["system", "satellite", "obs_code"], separator="-").alias(
"system_sat_obs"
),
]
)
# Group by the new time bin and system/sat/obs composite column
.group_by(["time_bin", "system_sat_obs"])
# Aggregate the SNR data by taking the mean
.agg(pl.col("snr").mean().alias("mean_snr"))
.sort("time_bin")
)
agg_df
| time_bin | system_sat_obs | mean_snr |
|---|---|---|
| datetime[ms, UTC] | str | f32 |
| 2025-07-20 01:10:00 UTC | "G-7-1C" | 30.200001 |
| 2025-07-20 01:15:00 UTC | "G-7-1C" | 33.325001 |
| 2025-07-20 01:20:00 UTC | "G-7-1C" | 34.299999 |
| 2025-07-20 01:25:00 UTC | "G-7-1C" | 34.900002 |
| 2025-07-20 01:30:00 UTC | "G-7-1C" | 36.075001 |
| … | … | … |
| 2025-08-31 23:35:00 UTC | "G-7-1C" | 44.412498 |
| 2025-08-31 23:40:00 UTC | "G-7-1C" | 44.724998 |
| 2025-08-31 23:45:00 UTC | "G-7-1C" | 45.450001 |
| 2025-08-31 23:50:00 UTC | "G-7-1C" | 45.700001 |
| 2025-08-31 23:55:00 UTC | "G-7-1C" | 46.212502 |
Efficient Visualization#
Polars works with Altair for visualization. Altair is much faster when using the Rust library vegafusion as a backend.
# Enable rust backend for altair
import altair as alt
_ = alt.data_transformers.enable("vegafusion")
Plotting satellite arcs#
The Polars dataframe has a useful .plot attribute for creating different types of plots. Below, we create a scatter plot using .plot.point(...)
agg_df.plot.point(
x="time_bin",
y="mean_snr",
color="system_sat_obs",
).properties(width=800, height=300)
Putting it all together#
Now let’s grab a some data across multiple stations for multiple satellites and codes.
# Fetch a week of data
start = dt.datetime(2025, 7, 20)
end = start + dt.timedelta(days=7)
table = await es.data.gnss_observations(
start_datetime=start,
end_datetime=end,
station_name=["P717", "P453", "P146", "P147", "P041"],
session_name="A",
system=["G", "R"],
obs_code=["1C", "2L"],
satellite=["7", "21", "28"],
field=["snr", "range"],
).fetch()
# Arrow table to Polars dataframe
df = pl.from_arrow(table)
# Aggregation
agg_df = (
df.with_columns(
[
pl.col("timestamp").dt.truncate("1m").alias("time_bin"),
pl.concat_str(["system", "satellite", "obs_code"], separator="-").alias(
"system_sat_obs"
),
]
)
.group_by(["time_bin", "system_sat_obs"])
.agg(
pl.col("snr").mean().alias("mean_snr"),
pl.col("range").max().alias("max_range"),
)
.sort("time_bin")
)
# plotting
agg_df.plot.point(
x="time_bin",
y="mean_snr",
color="system_sat_obs",
).properties(width=800, height=300)
# plotting
agg_df.plot.point(
x="time_bin",
y="max_range",
color="system_sat_obs",
).properties(width=800, height=300)
Geometry Free#
The Geometry-free combination cancels the geometric part of the measurement, leaving all the frequency-dependent effects (i.e., ionospheric refraction, instrumental delays, wind-up) besides multipath and measurement noise. It can be used to estimate the ionospheric electron content, to detect cycle-slips in the carrier phase, or also to estimate antenna rotations as well.
# Fetch a single arc of data
start = dt.datetime(2025, 7, 19, 22)
end = start + dt.timedelta(hours=18)
table = await es.data.gnss_observations(
start_datetime=start,
end_datetime=end,
station_name="AC60",
session_name="A",
system=["G"],
obs_code=["1C", "2W"],
satellite=["7"],
field=["phase", "range"],
).fetch()
# Arrow table to Polars dataframe
df = pl.from_arrow(table).sort("timestamp")
df
| timestamp | satellite | obs_code | range | phase | system | igs |
|---|---|---|---|---|---|---|
| datetime[ms, UTC] | u8 | str | f64 | f64 | str | str |
| 2025-07-20 01:13:15 UTC | 7 | "1C" | 2.5723e7 | null | "G" | "AC6000USA" |
| 2025-07-20 01:13:30 UTC | 7 | "1C" | 2.5714e7 | null | "G" | "AC6000USA" |
| 2025-07-20 01:14:15 UTC | 7 | "1C" | 2.5687e7 | null | "G" | "AC6000USA" |
| 2025-07-20 01:14:30 UTC | 7 | "1C" | 2.5677e7 | 1.3494e8 | "G" | "AC6000USA" |
| 2025-07-20 01:14:45 UTC | 7 | "1C" | 2.5668e7 | 1.3489e8 | "G" | "AC6000USA" |
| … | … | … | … | … | … | … |
| 2025-07-20 07:02:00 UTC | 7 | "2W" | 2.5648e7 | 1.0502e8 | "G" | "AC6000USA" |
| 2025-07-20 07:02:30 UTC | 7 | "1C" | 2.5665e7 | 1.3487e8 | "G" | "AC6000USA" |
| 2025-07-20 07:02:30 UTC | 7 | "2W" | 2.5665e7 | 1.0510e8 | "G" | "AC6000USA" |
| 2025-07-20 07:02:45 UTC | 7 | "1C" | 2.5674e7 | 1.3492e8 | "G" | "AC6000USA" |
| 2025-07-20 07:02:45 UTC | 7 | "2W" | 2.5674e7 | 1.0513e8 | "G" | "AC6000USA" |
# Reshape dataframe to enable diffing between 1C and 2W observations
# Pivot the obs_code column so that 1C and 2W become separate columns
df_pivoted = df.pivot(
values=["range", "phase"], # Include both range and phase for comparison
index=["igs", "timestamp", "satellite", "system"],
on="obs_code",
)
freq1 = 1575.42e6
freq2 = 1227.60e6
c = 299792458
lambda1 = c / freq1 # wavelength of 1C
lambda2 = c / freq2 # wavelength of 2W
# Now you can easily calculate the difference between 1C and 2W observations
gf_df = df_pivoted.with_columns(
[
(pl.col("range_2W") - pl.col("range_1C")).alias("gf_range"),
((pl.col("phase_1C") * lambda1) - (pl.col("phase_2W") * lambda2)).alias(
"gf_phase"
),
]
)
gf_df
| igs | timestamp | satellite | system | range_1C | range_2W | phase_1C | phase_2W | gf_range | gf_phase |
|---|---|---|---|---|---|---|---|---|---|
| str | datetime[ms, UTC] | u8 | str | f64 | f64 | f64 | f64 | f64 | f64 |
| "AC6000USA" | 2025-07-20 01:13:15 UTC | 7 | "G" | 2.5723e7 | null | null | null | null | null |
| "AC6000USA" | 2025-07-20 01:13:30 UTC | 7 | "G" | 2.5714e7 | null | null | null | null | null |
| "AC6000USA" | 2025-07-20 01:14:15 UTC | 7 | "G" | 2.5687e7 | null | null | null | null | null |
| "AC6000USA" | 2025-07-20 01:14:30 UTC | 7 | "G" | 2.5677e7 | null | 1.3494e8 | null | null | null |
| "AC6000USA" | 2025-07-20 01:14:45 UTC | 7 | "G" | 2.5668e7 | 2.5668e7 | 1.3489e8 | 1.0511e8 | 7.695 | -6.230897 |
| … | … | … | … | … | … | … | … | … | … |
| "AC6000USA" | 2025-07-20 07:01:30 UTC | 7 | "G" | 2.5630e7 | 2.5630e7 | 1.3468e8 | 1.0495e8 | 1.909 | -11.247463 |
| "AC6000USA" | 2025-07-20 07:01:45 UTC | 7 | "G" | 2.5639e7 | 2.5639e7 | 1.3473e8 | 1.0499e8 | 2.074 | -11.249344 |
| "AC6000USA" | 2025-07-20 07:02:00 UTC | 7 | "G" | 2.5648e7 | 2.5648e7 | 1.3478e8 | 1.0502e8 | 2.66 | -11.258719 |
| "AC6000USA" | 2025-07-20 07:02:30 UTC | 7 | "G" | 2.5665e7 | 2.5665e7 | 1.3487e8 | 1.0510e8 | 2.821 | -11.268065 |
| "AC6000USA" | 2025-07-20 07:02:45 UTC | 7 | "G" | 2.5674e7 | 2.5674e7 | 1.3492e8 | 1.0513e8 | 1.636 | -11.277862 |
# Convert to long format for easier plotting
gf_df_long = gf_df.unpivot(
["gf_range", "gf_phase"],
index=["igs", "timestamp", "satellite", "system"],
)
gf_df_long
# Plotting
gf_df_long.plot.point(x="timestamp", y="value", color="variable").properties(
width=800, height=300
)
Larger than memory requests#
Sometimes the amount of data you want to work with is larger than what fits in your machine’s memory. For such a scenario, you can use the query plan to easily iterate over subsets of the overall result set.
Note
Take a look at memory efficient access using Query Plans for more details.
In the following example, I want to process an entire network one day at a time.
# Create a query plan to fetch a week of data
start = dt.datetime(2025, 7, 20)
end = start + dt.timedelta(days=7)
plan = es.data.gnss_observations(
start_datetime=start,
end_datetime=end,
network_name="PERM:Alaska",
session_name="A",
system="G",
field=["phase", "range", "snr"],
)
# Helper function to print information about a table
def print_table_info(table):
df = pl.from_arrow(table)
ux_igs = list(df["igs"].unique().sort())
if len(ux_igs) > 10:
ux_igs = len(ux_igs)
min_time = df["timestamp"].min()
max_time = df["timestamp"].max()
count = len(df)
print(count, min_time, max_time, ux_igs)
Request groups effectively limit the concurrency of requests made to api.earthscope.org as well as limit how much data is loaded into memory at once.
In the following example, we create request groups for each day. All requests in a single group are executed in parallel and collected into a single Table. You can then do whatever you want with that table before proceeding to the next iteration of the loop where another request group is used to create the next table.
async for table in plan.group_by_day():
print_table_info(table)
1425616 2025-07-20 00:00:00+00:00 2025-07-20 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1423432 2025-07-21 00:00:00+00:00 2025-07-21 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1426148 2025-07-22 00:00:00+00:00 2025-07-22 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1425577 2025-07-23 00:00:00+00:00 2025-07-23 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1425262 2025-07-24 00:00:00+00:00 2025-07-24 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1424973 2025-07-25 00:00:00+00:00 2025-07-25 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
1423799 2025-07-26 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['ATW200USA', 'CGNN00USA', 'CLGO00USA', 'GRNX00USA', 'SELD00USA']
We can just as easily process the result set transposed: the entire time window but one station at a time:
async for table in plan.sort_by_station().group_by_station():
print_table_info(table)
2097697 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['ATW200USA']
1903112 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['CGNN00USA']
1979093 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['CLGO00USA']
2195651 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['GRNX00USA']
1799254 2025-07-20 00:00:00+00:00 2025-07-26 23:59:45+00:00 ['SELD00USA']