tccon_priors module

Main module for generating TCCON trace gas priors.

This module is the main driver to construct priors for CO2, CO, CH4, N2O, and HF for the TCCON retrieval. Broadly, each gas follows a similar scheme:

  • In the troposphere, the historical record for the gas is obtained from NOAA flask observations at Mauna Loa and Samoa (MLO/SMO). This record is deseasonalized by taking a 12 month running mean of the data. The age-of-air in the observation profile is calculated using a parameterization developed empirically from various in situ measurements for previous versions of the GGG package. That age is then used to determine what date in the MLO/SMO should be looked up. A parameterized seasonal cycle (again, developed for previous versions of GGG from in situ observations) is applied. We use the parameterized seasonal cycle, rather than the real seasonal cycle in the MLO/SMO record, because the latter will not capture any latitudinal dependence.

  • In the stratosphere, concentrations are calculated as the convolution of an age spectrum with the two month-lagged MLO/SMO record. That is, for different mean ages of air, there are defined spectra that provide the contribution of different ages to that air parcel. These can also be thought of as a probability distribution of age of air in a given parcel. The trace gas concentration for a given age and date is the product of the age spectrum and the trace gas record. A. Andrews derived different age spectra for tropics, midlatitudes, and polar vortex, therefore each level of the profile must be classified into one of these three regions by latitude, day of year, and age.

  • In the middle world (above the tropopause & theta < 380 K), the profiles are interpolate with respect to theta between the tropopause and the first overworld level.

The stratospheric approach was developed by Arlyn Andrews, based on her research in Andrews et al. 2001 (JGR, 106 [D23], pp. 32295-32314). For gases other than CO2, chemical loss or production in the stratosphere must be accounted for.

  • For N2O, the relationship of N2O vs. age of air from Andrews et al. (2001) was recast as fraction of N2O remaining vs. age, which allows us to use the MLO/SMO record directly rather than apply a growth factor. That is, in A. Andrews code, she uses a function to calculate the N2O concentration based on age from a relationship derived in the 1990s, then adds a growth factor to account for increase in N2O concentration since the 1990s. We instead use MLO/SMO to get the stratospheric boundary condition, then multiply by the fraction remaining vs. age to get the actual concentration. This allows us to use the MLO/SMO record to get the growth, rather than having to calculate a growth rate separately.

  • For CH4, we use ACE-FTS data to derive a CH4-N2O relationship. Again, this is in terms of fraction remaining for both CH4 and N2O. Therefore for a given age, we find the fraction of N2O remaining and then use the ACE-FTS relationship to convert that to a fraction of CH4 remaining. Since the F(CH4):F(N2O) relationship varies with potential temperature, the CH4 lookup table includes a theta dependence.

  • For HF, we use relationships between HF and CH4 concentration derived in Saad et al. (2014, doi: 10.5194/amt-7-2907-2014) and Washenfelder et al. (2003, doi: 10.1029/2003GL017969). These papers derive a slope of CH4 concentration vs. HF concentration. Saad et al. find variations in the slope with latitude; for consistency with the other gases, we use the tropics/midlatitudes/polar vortex regions rather than the latitude bins defined in Saad et al. We rederive our own slopes during the ACE-FTS era (from ~2004) for these three bins, then prepend the slopes back to ~1977 from Washenfelder et al. and fit the slope vs. time with an exponential. The HF code is set up to use the ACE-FTS derived slopes directly in years where they are available and slopes derived from the exponential fit outside the ACE window. The HF concentrations are calculated from CH4 concentrations assuming a linear relationship where the y-intercept is assumed to be the normal two month-lagged MLO/SMO stratospheric boundary condition and the slope is that precomputed from the ACE-FTS data or Washenfelder et al. Unlike N2O and CH4, fraction remaining was not used since the Washenfelder et al. results make clear the CH4/HF relationship changes substantially with time and we do not have ACE-FTS data to derive the F(CH4)/F(HF) relationship before 2004, after which the change in slope vs. time is essentially flat.

generate_single_tccon_prior() is the primary function to use to create a single prior. It is used for all gases

class ginput.priors.tccon_priors.CH4TropicsRecord(first_date=None, last_date=None, truncate_date=None, lag=None, mlo_file=None, smo_file=None, strat_age_scale=1.0, recalculate_strat_lut=None, save_strat=None, recalc_if_custom_dates=True, allow_negative_insitu_values=False, use_pre1p6_interpolation=False)
add_strat_prior(prof_gas, retrieval_date, mod_data, **kwargs)

Add the tropospheric component of the prior.

See the help for add_strat_prior_standard() in this module. All the inputs and outputs are the same except that gas_record will be given this instance.

classmethod get_frac_remaining_by_age(ages)

Get the fraction of a gas remaining for a given vector of ages.

The default is to assume no loss, and so 1 will be returned for every age. Subclasses may override this method to calculate more complicated relationships between age and fraction remaining.

Parameters

ages (numpy.ndarray or float) – the vector of ages to calculate the fraction of the gas remaining for

Returns

a data frame indexed by age with one column, “fraction” containing the fraction remaining.

Return type

pandas.DataFrame

lat_bias_correction(obs_date, obs_lat, mod_data, prior_data)

Returns a latitudinal bias correction to add to the prior

Parameters
  • obs_date (datetime-like) – the date of the observation

  • obs_lat (float) – the latitude of the observation

  • mod_data (dict) – the dictionary of data read in from the .mod file

  • prior_data (dict) – a dictionary of data calculated for the prior, including the keys: “age_of_air” (the tropospheric age of air profile), “adj_zgrid” (the adjusted altitude grid used for the tropospheric prior) and “z_trop” (the tropopause height).

Returns

a float or float array to add to the prior profile to correct latitudinally-dependent biases in the troposphere.

Return type

float or array-like

class ginput.priors.tccon_priors.CO2TropicsRecord(first_date=None, last_date=None, truncate_date=None, lag=None, mlo_file=None, smo_file=None, strat_age_scale=1.0, recalculate_strat_lut=None, save_strat=None, recalc_if_custom_dates=True, allow_negative_insitu_values=False, use_pre1p6_interpolation=False)
lat_bias_correction(obs_date, obs_lat, mod_data, prior_data)

Returns a latitudinal bias correction to add to the prior

Parameters
  • obs_date (datetime-like) – the date of the observation

  • obs_lat (float) – the latitude of the observation

  • mod_data (dict) – the dictionary of data read in from the .mod file

  • prior_data (dict) – a dictionary of data calculated for the prior, including the keys: “age_of_air” (the tropospheric age of air profile), “adj_zgrid” (the adjusted altitude grid used for the tropospheric prior) and “z_trop” (the tropopause height).

Returns

a float or float array to add to the prior profile to correct latitudinally-dependent biases in the troposphere.

Return type

float or array-like

exception ginput.priors.tccon_priors.GasRecordDateError

Error to raise for any issues with dates in the gas records

exception ginput.priors.tccon_priors.GasRecordError

Base error for problems in the CO2 record

exception ginput.priors.tccon_priors.GasRecordExtrapolationError

Error to raise if there’s a problem with the extrapolation of the MLO/SMO records.

exception ginput.priors.tccon_priors.GasRecordInconsistentDimsError

Error when arrays in a gas record have different dimensions and are not supposed to

exception ginput.priors.tccon_priors.GasRecordInputMissingError

Error to use when cannot find the necessary input files for a trace gas record

exception ginput.priors.tccon_priors.GasRecordInputVerificationError

Error when the input could not be verified (i.e. if hashes don’t match)

class ginput.priors.tccon_priors.HFTropicsRecord(first_date=None, last_date=None, truncate_date=None, lag=None, mlo_file=None, smo_file=None, strat_age_scale=1.0, recalculate_strat_lut=None, save_strat=None, recalc_if_custom_dates=True, allow_negative_insitu_values=False, use_pre1p6_interpolation=False)
add_extra_column(prof_gas, retrieval_date, mod_data, **kwargs)

Add a representation of out-of-range column density to the prior

This method is intended for use to handle cases like CO, which has a fairly large mesospheric column that can’t be directly represented in the prior. It will add extra concentration to one or more levels in the prior such that, when integrated, the extra column density is accounted for. Ideally, this should be called after interpolating to the final levels that the priors will be used on in GGG so that the column can be reproduced exactly by the integration.

Parameters
  • prof_gas (numpy.ndarray) – the profile to modify. Modified in-place

  • retrieval_date – the date/time of the prior profile (unused, present for consistency)

  • mod_data (dict) – the dictionary of model data read in from the .mod file.

  • kwargs – unused, swallows extra keyword arguments.

Returns

the modified gas profile and a dictionary on ancillary information (currently empty).

classmethod get_mlo_smo_mean(mlo_file, smo_file, first_date, last_date, truncate_date, allow_negative_insitu_values=False, use_pre1p6_interpolation=False)

Generate the Mauna Loa/Samoa mean trace gas record.

For HF, there is no MLO/SMO record because it has no presence in the troposphere. Since this method is called by __init__ to set the seasonal cycle concentration, we override it to just create a data frame with the correct format but with concentrations of 0 for all times.

Parameters
  • mlo_file (str) – unused, kept for consistency with other TraceGasTropicsRecord subclasses

  • smo_file (str) – unused, kept for consistency with other TraceGasTropicsRecord subclasses

  • first_date (datetime-like) – the earliest date to use in the record. Note that the actual first date will always be the first of the month for this date.

  • last_date (datetime-like) – the latest date to include in the record. Note that if it is not the first of the month, then the actual latest date used would be the next first of the month to follow this date. I.e. if this is June 15th, then July 1st would be used instead.

  • truncate_date – unused, since HF has no MLO/SMO data.

  • allow_negative_insitu_values (bool) – set to True to allow the in situ files to include negative DMF values. Normally this is not allowed, as the DMFs for long-lived gases should be positive and negative values normally indicate a fill value is present. Such fill values will lead to incorrect combined MLO+SMO values. Note, this has no effect for HFTropicsRecord, it is included only as part of the required interface.

  • use_pre1p6_interpolation (bool) – unused, since HF has no MLO/SMO data.

Returns

the data frame containing the mean trace gas concentration (‘dmf_mean’), a flag (‘interp_flag’) set to 1 for any months that had to be interpolated and 2 for months that had to be extrapolated, and the latency (‘latency’) in years that a concentration had to be extrapolated. Index by timestamp.

Return type

pandas.DataFrame

list_strat_dependent_files()

Return a dictionary describing the files that the stratospheric LUT depends on.

This dictionary will have the keys be the attribute names to use in the LUT netCDF file and the values be the paths to the files that the LUT depends on. Each file’s SHA1 hash will get stored in the netCDF file under the global attribute named by its key.

For most trace gas records, this will be the Mauna Loa and Samoa flask data files. However, if certain trace gas records depend on other files, this method should be overridden to return the proper dictionary.

Return type

dict

class ginput.priors.tccon_priors.MloSmoTraceGasRecord(first_date=None, last_date=None, truncate_date=None, lag=None, mlo_file=None, smo_file=None, strat_age_scale=1.0, recalculate_strat_lut=None, save_strat=None, recalc_if_custom_dates=True, allow_negative_insitu_values=False, use_pre1p6_interpolation=False)

This class stores the Mauna Loa/Samoa average DMF record and provides methods to derive a full prior profile from it.

Initialization arguments:

Parameters
  • first_date (datetime-like) – optional, the first date required in the concentration records. The actual first date will be before this, as the age spectra calculation require ~30 years of data preceding each date, therefore the simple time series records will be extended to first_date - 30 years. If not given, then 1 Jan 2000 is assumed (meaning the actual first date will be 1 Jan 1970). The date will always be moved to the first of a month.

  • last_date (datetime-like) – optional, the last date required in the concentration records. The date will always be moved to the first of a month. If omitted, a date two years from today is used. Unlike first_date, there is no modification to account for the needs of the age spectra.

  • truncate_date – the last date to use real data for in the record, after this date the MLO/SMO time series will be extrapolated. Note that this is inclusive.

  • lag (timedelta-like) – optional, the lag between Mauna Loa/Samoa measurements and the stratospheric boundary condition. Default is two months, i.e. the stratospheric boundary condition for a given date is assumed to be that measured at MLO/SMO two months previously.

  • mlo_file (str) – optional, the path to the Mauna Loa flask data file. Must be formatted as a NOAA monthly flask data file, where the first line is “# f_header_lines: n” (n being the number of header lines) and the data being organized in four columns (space separated): site, year, month, value.

  • smo_file (str) – optional, the path to the Samoa flask data file. Same format as the MLO file required.

  • recalculate_strat_lut (bool or None) – optional, set to True to force the stratospheric concentrations look up table to be recalculated or False to always use the existing lookup table if it exists. Default is None, which will check if any of the files that the LUT depends on have changed, and recalculate it if so.

  • save_strat (bool or None) – optional, set to False to avoid saving the stratospheric concentration lookup if it is recalculated. Default is None, which will save the LUT if recalculated unless it was recalculated to cover the time frame requested. This option has no effect if the stratospheric lookup table is read from the netCDF file.

  • allow_negative_insitu_values (bool) – set to True to allow the in situ files to include negative DMF values. Normally this is not allowed, as the DMFs for long-lived gases should be positive and negative values normally indicate a fill value is present. Such fill values will lead to incorrect combined MLO+SMO values.

  • use_pre1p6_interpolation – set to True to use the method of interpolating (and extrapolating) MLO and SMO data that was in place before ginput v1.6. This is provided for backwards compatibility only; the new default is recommended as it better handles gaps in the MLO or SMO data.

add_extra_column(prof_gas, retrieval_date, mod_data, **kwargs)

Add a representation of out-of-range column density to the prior

This method is intended for use to handle cases like CO, which has a fairly large mesospheric column that can’t be directly represented in the prior. It will add extra concentration to one or more levels in the prior such that, when integrated, the extra column density is accounted for. Ideally, this should be called after interpolating to the final levels that the priors will be used on in GGG so that the column can be reproduced exactly by the integration.

Parameters
  • prof_gas (numpy.ndarray) – the profile to modify. Modified in-place

  • retrieval_date – the date/time of the prior profile (unused, present for consistency)

  • mod_data (dict) – the dictionary of model data read in from the .mod file.

  • kwargs – unused, swallows extra keyword arguments.

Returns

the modified gas profile and a dictionary on ancillary information (currently empty).

add_strat_prior(prof_gas, retrieval_date, mod_data, **kwargs)

Add the tropospheric component of the prior.

See the help for add_strat_prior_standard() in this module. All the inputs and outputs are the same except that gas_record will be given this instance.

add_trop_prior(prof_gas, obs_date, obs_lat, mod_data, use_adjusted_zgrid=True, **kwargs)

Add the tropospheric component of the prior.

See the help for add_trop_prior_standard() in this module. All the inputs and outputs are the same except that gas_record will be given this instance.

avg_gas_in_date_range(start_date, end_date, deseasonalize=False)

Average the MLO/SMO record between the given dates

Parameters
  • start_date (datetime-like object) – the first date in the averaging period

  • end_date (datetime-like object) – the last date in the averaging period

  • deseasonalize (bool) – whether to draw concentration data from the trend only (True) or the seasonal cycle (False).

Returns

the average concentration and a dictionary specifying the mean, minimum, and maximum latency (number of years the concentrations had to be extrapolated)

Return type

float, dict

conc_df_to_nc(nc_file: PathLike, trend: bool = False)

Write the seasonal or trend concentration dataframe to a netCDF file.

Parameters
  • nc_file – path to write the netCDF file to. Will be overwritten if it exists.

  • trend – if True, writes self.conc_trend. Otherwise, writes self.conc_seasonal.

classmethod get_frac_remaining_by_age(ages)

Get the fraction of a gas remaining for a given vector of ages.

The default is to assume no loss, and so 1 will be returned for every age. Subclasses may override this method to calculate more complicated relationships between age and fraction remaining.

Parameters

ages (numpy.ndarray or float) – the vector of ages to calculate the fraction of the gas remaining for

Returns

a data frame indexed by age with one column, “fraction” containing the fraction remaining.

Return type

pandas.DataFrame

get_gas_by_age(ref_date, age, deseasonalize=False, as_dataframe=False)

Get concentrations for one or more times by specifying a reference date and age.

This called get_gas_for_dates() internally, so the concentration is interpolated to the specific day just as that method does.

Parameters
  • ref_date (datetime-like object.) – the date that the ages are relative to.

  • age (float or sequence of floats) – the number of years before the reference date to get the concentration from. May be a non-whole number.

  • deseasonalize (bool) – whether to draw concentration data from the trend only (True) or the seasonal cycle (False).

  • as_dataframe (bool) – whether to return the concentration data as a dataframe (True) or numpy array (False)

Returns

the concentration data for the requested date(s), as a numpy vector or data frame. The data frame will also include the latency (how many years the concentrations had to be extrapolated).

get_gas_by_month(year, month, deseasonalize=False)

Get the trace gas concentration for a specific month

Parameters
  • year (int) – the date’s year

  • month (int) – the date’s month

  • deseasonalize (bool) – whether to draw concentration data from the trend only (True) or the seasonal cycle (False).

Returns

the gas concentration and a dictionary with additional information (e.g. the latency, that is, how far the concentrations had to be extrapolated).

Return type

float, dict

get_gas_for_dates(dates, deseasonalize=False, as_dataframe=False)

Get trace gas concentrations for one or more dates.

This method will lookup concentrations for a specific date or dates, interpolating between the monthly values as necessary.

Parameters
  • dates – the date or dates to get concentrations for. If giving a single date, it may be any time that can be converted to a Pandas Timestamp. If giving a series of dates, it must be a pandas.DatetimeIndex.

  • deseasonalize (bool) – whether to draw concentrations data from the trend only (True) or the seasonal cycle (False).

  • as_dataframe (bool) – whether to return the concentrations data as a dataframe (True) or numpy array (False)

Returns

the concentration data for the requested date(s), as a numpy vector or data frame. The data frame will also include the latency (how many years the concentrations had to be extrapolated).

classmethod get_mlo_smo_mean_joint_fill(mlo_file: PathLike, smo_file: PathLike, first_date: datetime, last_date: datetime, truncate_date: datetime, allow_negative_insitu_values: bool = False) DataFrame

Generate the Mauna Loa/Samoa mean trace gas record from the files stored in this repository.

This is the original method used in v1.0 to v1.5. It only uses months where both MLO and SMO had data and fills in the rest with linear interpolation or trend + seasonal cycle extrapolation. However, this means it does not handle large gaps in data from either site well, hence why this was deprecated in v1.6 in favor of get_mlo_smo_mean_separate_fill().

Reads in the given Mauna Loa and Samoa record files, averages then, fills in missing values by interpolation, extrapolates as needed to provide the full record requested, and returns the result.

Parameters
  • mlo_file (str) – the name (not the full path) of the Mauna Loa flask data file that is included in the repo data directory.

  • smo_file (str) – the name (not the full path) of the Samoa flask data file that is included in the repo data directory.

  • first_date (datetime-like) – the earliest date to use in the record. Note that the actual first date will always be the first of the month for this date.

  • last_date (datetime-like) – the latest date to include in the record. Note that if it is not the first of the month, then the actual latest date used would be the next first of the month to follow this date. I.e. if this is June 15th, then July 1st would be used instead.

  • truncate_date – the last date to use real data for in the record, after this date the MLO/SMO time series will be extrapolated. Note that this is inclusive.

  • allow_negative_insitu_values – set to True to allow the in situ files to include negative DMF values. Normally this is not allowed, as the DMFs for long-lived gases should be positive and negative values normally indicate a fill value is present. Such fill values will lead to incorrect combined MLO+SMO values.

Returns

the data frame containing the mean trace gas concentration (‘dmf_mean’), a flag (‘interp_flag’) set to 1 for any months that had to be interpolated and 2 for months that had to be extrapolated, and the latency (‘latency’) in years that a concentration had to be extrapolated. Index by timestamp.

classmethod get_mlo_smo_mean_separate_fill(mlo_file: PathLike, smo_file: PathLike, first_date: datetime, last_date: datetime, truncate_date: datetime, allow_negative_insitu_values: bool = False) DataFrame

Compute the mean timeseries of MLO and SMO data by filling in missing data for each site separately.

Parameters
  • mlo_file – path to the MLO (Mauna Loa) monthly average file.

  • smo_file – path to the SMO (American Samoa) monthly average file.

  • first_date – the earliest date required in the timeseries. If the monthly mean file does not extend back to this date, it will be extrapolated. Should be a date on the first of a month.

  • last_date – the latest date required in the timeseries. If the monthly mean file does not extend forward to this date, it will be extrapolated. Should be a date on the first of a month.

  • truncate_date – the last date of real data to use in the timeseries. If the input file does not include this date, a GasRecordDateError will be raised. Otherwise, the data from the file will be cut off at this date and extrapolated from there to last_date. This allows you to ensure reproducibility even if you update the input file with additional data.

  • allow_negative_values – set to True to allow the in situ file to contain negative values. Because we are reading file for gases like CO2, N2O, and CH4 (which have large background concentrations), negative values usually mean a fill value exists in the file. Fill values should be replaced with NaNs. If your file actually has negative values, then you will need to set this to True.

Returns

a dataframe with columns,

  • ”dmf_mean”: the mean mole fraction between MLO and SMO

  • ”latency”: the mean extrapolation latency between the two sites.

  • ”interp_flag”: a flag that will be 0 if both sites had real data, 1 if both sites were interpolated, 2 if both sites were extrapolated, and 3 if the two sites differed in how that month’s value was obtained.

  • ”mlo_interp_detail_flag”: the detailed interpolation flag for MLO, see read_and_fill_insitu_gas() for the bit meanings.

  • ”smo_interp_detail_flag”: same, but for SMO.

get_strat_gas(date, ages, eqlat, theta=None, as_dataframe=False)

Get stratospheric gas concentration for a given profile

Parameters
  • date (datetime-like) – the UTC date of the observation

  • ages (array-like) – the age or ages of air (in years) to get concentration for. Must be the same shape as eqlat.

  • eqlat (array-like) – the equivalent latitude or eq. lat profile to get concentration for. Must be the same shape as ages.

  • theta – the potential temperature profile associated with the prior. Only required if the

  • as_dataframe (bool) – if True, the gas concentration will be returned as a data frame. If False, it will be returned as an array if ages and eqlat were arrays or a float if they were floats.

Returns

the gas concentration as a data frame, numpy array, or scalar, depending on as_dataframe and the input types. Also returns None, a placeholder for future information about profile latency, etc.

Return type

float, numpy.ndarray, or pandas.DataFrame

lat_bias_correction(obs_date, obs_lat, mod_data, prior_data)

Returns a latitudinal bias correction to add to the prior

Parameters
  • obs_date (datetime-like) – the date of the observation

  • obs_lat (float) – the latitude of the observation

  • mod_data (dict) – the dictionary of data read in from the .mod file

  • prior_data (dict) – a dictionary of data calculated for the prior, including the keys: “age_of_air” (the tropospheric age of air profile), “adj_zgrid” (the adjusted altitude grid used for the tropospheric prior) and “z_trop” (the tropopause height).

Returns

a float or float array to add to the prior profile to correct latitudinally-dependent biases in the troposphere.

Return type

float or array-like

list_strat_dependent_files()

Return a dictionary describing the files that the stratospheric LUT depends on.

This dictionary will have the keys be the attribute names to use in the LUT netCDF file and the values be the paths to the files that the LUT depends on. Each file’s SHA1 hash will get stored in the netCDF file under the global attribute named by its key.

For most trace gas records, this will be the Mauna Loa and Samoa flask data files. However, if certain trace gas records depend on other files, this method should be overridden to return the proper dictionary.

Return type

dict

classmethod read_and_fill_insitu_gas(full_file_path: PathLike, first_date: datetime, last_date: datetime, truncate_date: datetime, allow_negative_values: bool = False, max_months_simple_interp: int = 0, nyears: Optional[int] = None) DataFrame

Read a trace gas file and fill in missing values out to the first and last date required.

Assumes that the file contains monthly mean DMFs.

Parameters
  • full_file_path – path to the monthly mean file to read.

  • first_date – the earliest date required in the timeseries. If the monthly mean file does not extend back to this date, it will be extrapolated. Should be a date on the first of a month.

  • last_date – the latest date required in the timeseries. If the monthly mean file does not extend forward to this date, it will be extrapolated. Should be a date on the first of a month.

  • truncate_date – the last date of real data to use in the timeseries. If the input file does not include this date, a GasRecordDateError will be raised. Otherwise, the data from the file will be cut off at this date and extrapolated from there to last_date. This allows you to ensure reproducibility even if you update the input file with additional data.

  • allow_negative_values – set to True to allow the in situ file to contain negative values. Because we are reading file for gases like CO2, N2O, and CH4 (which have large background concentrations), negative values usually mean a fill value exists in the file. Fill values should be replaced with NaNs. If your file actually has negative values, then you will need to set this to True.

  • max_months_simple_interp – controls how missing data within the file is filled in. For gaps of this many months or fewer, the gap will be filled with a simple linear interpolation in time. For larger gaps, it will be filled with the same logic as is used for the extrapolation, using a combination of the fitted secular trend and mean seasonal cycle.

  • nyears – the number of years used to fit the secular trend for interpolating large gaps and extrapolating backward from the start and forward from the end of the real data. For interpolation, the trend is fit and mean seasonal cycle calculated using this many years on either side of the gap. For extrapolation, it uses this many years from the start or end of the record. If this value is not specified, then it uses cls._nyears_for_extrap_avg.

Returns

a dataframe containing the gas values, latency (for extrapolated values), and two flags describing the interpolation/extrapolation. interp_flag will be 0 for real data, 1 for interpolated data, and 2 for extrapolated data. interp_detail_flag is a bit flag: the 1s bit indicates it was extrapolated, the 2s bit indicates the simple, linear interpolation was used, and the 4s bit indicates the trend + seasonal cycle interpolation was used.

classmethod read_insitu_gas(full_file_path: PathLike, allow_negative_values: bool = False) DataFrame

Read a trace gas record file. Assumes that the file is of monthly average concentrations.

Parameters
  • full_file_path – the path to the file.

  • allow_negative_values – set to True to allow the in situ file to contain negative values. Because we are reading file for gases like CO2, N2O, and CH4 (which have large background concentrations), negative values usually mean a fill value exists in the file. Fill values should be replaced with NaNs. If your file actually has negative values, then you will need to set this to True.

Returns

a data frame containing the monthly trace gas data along with the site, year, month, and day. The index will be a timestamp of the measurment time.

class ginput.priors.tccon_priors.N2OTropicsRecord(first_date=None, last_date=None, truncate_date=None, lag=None, mlo_file=None, smo_file=None, strat_age_scale=1.0, recalculate_strat_lut=None, save_strat=None, recalc_if_custom_dates=True, allow_negative_insitu_values=False, use_pre1p6_interpolation=False)
classmethod get_frac_remaining_by_age(ages)

Get the fraction of a gas remaining for a given vector of ages.

The default is to assume no loss, and so 1 will be returned for every age. Subclasses may override this method to calculate more complicated relationships between age and fraction remaining.

Parameters

ages (numpy.ndarray or float) – the vector of ages to calculate the fraction of the gas remaining for

Returns

a data frame indexed by age with one column, “fraction” containing the fraction remaining.

Return type

pandas.DataFrame

list_strat_dependent_files()

Return a dictionary describing the files that the stratospheric LUT depends on.

This dictionary will have the keys be the attribute names to use in the LUT netCDF file and the values be the paths to the files that the LUT depends on. Each file’s SHA1 hash will get stored in the netCDF file under the global attribute named by its key.

For most trace gas records, this will be the Mauna Loa and Samoa flask data files. However, if certain trace gas records depend on other files, this method should be overridden to return the proper dictionary.

Return type

dict

class ginput.priors.tccon_priors.O2MeanMoleFractionRecord(o2_mole_fraction_file: Union[str, Path] = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ginput/checkouts/latest/ginput/data/o2_mean_dmf.dat'), delay_years: int = 2, max_extrap_years: int = 3, extrap_basis_years: int = 5, auto_update_fo2_file: bool = False, auto_update_td: timedelta = datetime.timedelta(days=7), auto_update_from_tccondata: bool = False)

A record of the global mean O2 dry mole fraction.

This class does not inherit from TraceGasRecord because it does not provide a profile; it provides a global mean value for a given date.

Initialization arguments:

Parameters
  • o2_mole_fraction_file – path to the file containing a timeseries of O2 mole fractions, created by the fo2_prep module (accessed via the “update_fo2” subcommand of run_ginput.py).

  • delay_years – number of years before the target year to exclude from the f(O2) data. This is to support reproducible files, see discussion below.

  • max_extrap_years – number of years after the final year (following truncation) of the f(O2) data to extrapolate.

  • extrap_basis_years – number of years at the end of the f(O2) data (following truncation) to fit for the extrapolation.

  • auto_update_fo2_file – set to True to try automatically updating the f(O2) data file. This is False by default because is does require downloading data from Scripps and NOAA, and our philosophy is that any action taken over the internet should require you to opt-in to that.

  • auto_update_td – the timedelta defining how long ago the f(O2) data file must have been updated to try updating it again if auto_update_fo2_file is True. Setting to None will always try to update the file. If the file does not exist and auto_update_fo2_file = True, then it will always be created.

Note

What are delay_years and max_extrap_years all about? The issue is that there is some latency in the NOAA and Scripps data, and we need to make sure that we can reproduce the same output whenever we run ginput. The NOAA data tends to set the latency, since it is a yearly average. For example, it is Aug 2024 as I write this, and the NOAA global data extends to 2023. It will probably be a few months into 2025 before the 2024 data is available, so if I try to generate priors for 1 Jan 2025 on 2 Jan 2025, the 2024 NOAA data certainly won’t be available, but if I generate those priors on 1 May 2025, the 2024 NOAA data will probably be available. This makes the O2 DMFs dependent on when we run, which isn’t ideal.

The solution is similar to what we do for OCO-2/3 and will eventually do for the primary gases for TCCON: only use up to a certain number of years before our priors’ date no matter if later data is available or not. In the example of making priors for 1 Jan 2025, our default is to withhold two years (delay_years=2), so we only use up to 2023 data, which should definitely be available by then. We then extrapolate 3 years by default (max_extrap_years=3), bringing us to 2026. Since we treat the O2 data as being at the midpoint of each year, i.e. 1 July, that ensures that we will always have a data point after any date in 2025.

This does, of course, introduce error into the f(O2) estimation, though at least as of 2024, f(O2) is changing pretty linearly, so the error is small. If that starts to change, then we will revisit this approach. The error is a reasonable price to pay in exchange for reproducible runs.

get_many_o2_mole_fractions(target_dates: DatetimeIndex)

Calculate the O2 mole fractions for a sequence of dates.

The date must be within the bounds of the O2 data availble plus however long we are allowed to extrapolate for, or a RuntimeError will be raised.

get_o2_mole_fraction(target_date: Timestamp)

Calculate the O2 mole fraction for a given date.

The date must be within the bounds of the O2 data availble plus however long we are allowed to extrapolate for, or a RuntimeError will be raised.

truncate_and_extrapolate(target_year: int)

Return a copy of the O2 dataframe truncated by self._delay_years and extrapolated.

This implements the logic to ensure consistent O2 mole fractions as new NOAA and Scripps data become available, see the note on the class documentation.

Parameters

target_year – the year of the date we want an O2 mole fraction for.

ginput.priors.tccon_priors.add_strat_prior_standard(prof_gas, retrieval_date, gas_record, mod_data, profs_latency=None, prof_aoa=None, prof_world_flag=None, gas_record_dates=None)

Add the stratospheric trace gas to a TCCON prior profile using the standard approach.

Parameters
  • prof_gas (numpy.ndarray (in ppm)) – the profile trace gase mixing ratios. Will be modified in-place to add the stratospheric component.

  • retrieval_date (datetime.datetime) – the UTC date of the retrieval.

  • gas_record (MloSmoTraceGasRecord) – the Mauna Loa-Samoa CO2 record.

The following parameters are all optional; they are vectors that will be filled with the appropriate values in the stratosphere. The are also returned in the ancillary dictionary; if not given as inputs, they are initialized with NaNs. “nlev” below means the number of levels in the CO2 profile.

Parameters
  • profs_latency – nlev-by-3 array that will store how far forward in time the Mauna Loa/Samoa CO2 record had to be extrapolated, in years. The three columns will respectively contain the mean, min, and max latency.

  • prof_aoa – nlev-element vector of ages of air, in years.

  • prof_world_flag – nlev-element vector of ints which will indicate which levels are considered overworld and which middleworld. The values used for each are defined in mod_constants

Returns

the updated CO2 profile and a dictionary of the ancillary profiles.

ginput.priors.tccon_priors.add_trop_prior_standard(prof_gas, obs_date, obs_lat, gas_record, mod_data, ref_lat=45.0, use_theta_eqlat=True, profs_latency=None, prof_aoa=None, prof_world_flag=None, prof_gas_date=None, use_adjusted_zgrid=True, co_source=None)

Add troposphere concentration to the prior profile using the standard approach.

Parameters
  • prof_gas (numpy.ndarray) – the profile trace gas mixing ratios. Will be modified in-place to add the stratospheric component.

  • obs_date (datetime.datetime) – the UTC date of the retrieval.

  • obs_lat (float) – the latitude of the retrieval (degrees, south is negative)

  • gas_record (MloSmoTraceGasRecord) – the Mauna Loa-Samoa record for the desired gas.

  • mod_data – the dictionary of .mod file data. Must have the tropo

  • ref_lat (float.) – the reference latitude for age of air. Effectively sets where the age begins, i.e where the emissions are.

  • use_theta_eqlat (bool) – set to True to use an equivalent latitude derive from the mid-tropospheric potential temperature as the latitude in the age of air and seasonal cycle calculations. This helps correct overly curved profiles at sites near the tropics that sometimes have more tropical-like profiles depending on synoptic scale transport. If this is False, then obs_lat is used directly.

The following parameters are all optional; they are vectors that will be filled with the appropriate values in the stratosphere. The are also returned in the ancillary dictionary; if not given as inputs, they are initialized with NaNs. “nlev” below means the number of levels in the CO2 profile.

Parameters
  • profs_latency – nlev-by-3 array that will store how far forward in time the Mauna Loa/Samoa CO2 record had to be extrapolated, in years. The three columns will respectively contain the mean, min, and max latency.

  • prof_aoa – nlev-element vector of ages of air, in years.

  • prof_world_flag – nlev-element vector of ints which will indicate which levels are considered overworld and which middleworld. The values used for each are defined in mod_constants

  • prof_gas_date – nlev-element vector that stores the date in the MLO/SMO record that the gas was taken from. Since most levels will have a window of dates, this is the middle of those windows. The dates are stored as a datetime object.

  • co_source – unused, needed for consistency with other add_trop_prior functions, which can accept but ignore this input (which is required for CO priors)

Returns

the updated CO2 profile and a dictionary of the ancillary profiles.

ginput.priors.tccon_priors.generate_full_tccon_vmr_file(mod_data, utc_offsets, save_dir, product='fpit', std_vmr_file=None, site_abbrevs='xx', keep_latlon_prec=False, use_existing_luts=False, mlo_smo_files: Optional[dict] = None, **kwargs)

Generate a .vmr file with all the gases required by TCCON (both retrieved and secondary).

mod_data, utc_offsets and site_abbrevs may be single values or collections. See generate_tccon_priors_driver() in this module for details.

Parameters
  • mod_data (dict or str) – a dictionary mimicking that from reading a .mod file or the path to a .mod file

  • utc_offsets (datetime.timedelta or list(datetime.timedelta)) – difference(s) between local time and UTC time for each site

  • save_dir (str) – where to save the .vmr files

  • std_vmr_file (None, str, or bool) – a standard .vmr file that has profiles for all the gases needed by TCCON, as well as their seasonal cycles, latitudinal gradients, and secular trends. These profiles are assumed to be base profiles representative of one latitude/time that can be modified for other locations/times. If this is not given, then the code will try to look for $GGGPATH/vmrs/gnd/summer_35N.vmr. If you do not have GGGPATH defined as an environmental variable, it will error. You may pass an explicit path to a .vmr file to override that, or False to only write the primary gases to the .vmr file.

  • site_abbrevs (str or list(str)) – abbreviation or list of abbreviations for the sites the .vmr files are being written for.

  • keep_latlon_prec (bool) – by default, latitude/longitude in the .vmr filenames is rounded to the nearest integer. Set this to True to keep 2 decimal places of precision.

  • use_existing_luts (bool) – set to True to avoid recalculating stratospheric LUTs for the MLO/SMO records. Doing so will make it much faster for this to start, but risks using an out-of-date LUT that was generated with old code or input data.

  • mlo_smo_files

    if given, a dictionary with lowercase gas names for keys and subdictionaries for values. The subdictionaries must have the keys “mlo_file” and “smo_file” with paths to the files to use as values. Any gases not included in the dictionary will use the default files. Example:

    {
        'co2': {'mlo_file': './x2019/ml_co2_x2019.txt', 'smo_file': './x2019/smo_co2_x2019.txt'},
        'ch4': {'mlo_file': './test/ml_ch4_test.txt', 'smo_file', './test/smo_ch4_test.txt'}
    }
    

Returns

none, writes .vmr files

Raises

GGGPathError – if $GGGPATH is not defined and it needs to find the standard file or it cannot find the standard file in the expected place.

ginput.priors.tccon_priors.generate_single_tccon_prior(mod_file_data, utc_offset, concentration_record, zgrid=None, use_eqlat_trop=True, use_eqlat_strat=True, use_adjusted_zgrid=True, o2_mole_fraction_file=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ginput/checkouts/latest/ginput/data/o2_mean_dmf.dat'), auto_update_fo2_file=False)

Driver function to generate the TCCON prior profiles for a single observation.

Parameters
  • mod_file_data (str or dict) – data from a .mod file prepared by Mod Maker. May either be a path to a .mod file, or a dictionary from read_mod_file().

  • utc_offset – a timedelta giving the difference between the file_date and UTC time. For example, if the file_date was given in US Pacific Standard Time, this should be timedelta(hours=-8). This is used to correct the date to UTC to ensure the CO2 from the right time is used.

  • concentration_record (str or MloSmoTraceGasRecord) – which species to generate the prior profile for. Must be the proper subclass of TraceGasTropicsRecord for the given species. The latter is useful if you are making multiple calls to this function, as it removes the need to instantiate the record during each call

  • site_abbrev (str) – the two-letter site abbreviation. Currently only used in naming the output file.

  • zgrid (str, numpy.ndarray, or xarray.DataArray) – specifies what altitude grid to interpolate the priors to. May be either a string pointing to an integral*.gnd file or an array of altitudes (in kilometers).

  • use_eqlat_trop (bool) – when True, the latitude used for age-of-air and seasonal cycle calculations is calculate based on the climatology of latitude vs. mid-tropospheric potential temperature. When False, the geographic latitude of the observation is used.

  • use_eqlat_strat (bool) – when True, the stratosphere profiles use equivalent latitude that must be given in the mod data (requires the variable “EL” in the dictionary/mod file). Setting this to False uses the geographic latitude of the observation instead. This allows you to skip the (fairly processor intensive) equivalent latitude calculation when preparing the .mod files, but can lead to ~2% differences in CO2 near the tropopause (in March).

  • use_adjusted_zgrid (bool) – when True, the altitude grid near the surface will be stretched or compressed in an effort to match the lowest level of the 3D altitude grid to the surface altitude. When False, the altitude grid is used as-is.

  • o2_mole_fraction_file – path to the file containing yearly pre-calculated O2 mole fractions. Pass None as this parameter to skip calculating the O2 DMF. In that case, the ‘global_o2_dry_mole_fraction’ entry in the third returned dictionary will be None.

  • auto_update_fo2_file (bool) – if True, automatically update the f(O2) data file if it is missing or it has been more than 7 days since it was last updated.

Returns

a dictionary containing all the profiles (including many for debugging), a dictionary containing the units of the values in each profile, and a dictionary of additional constants.

Return type

dict, dict, dict

ginput.priors.tccon_priors.generate_tccon_priors_driver(mod_data, utc_offsets, species, site_abbrevs='xx', write_vmrs=False, gas_name_order=None, keep_latlon_prec=False, flat_outdir=True, product='fpit', special_header_info: Optional[dict] = None, auto_update_fo2_file=False, **prior_kwargs)

Generate multiple TCCON priors or a file containing multiple gas concentrations

This function wraps generate_single_tccon_prior() in order to generate priors for one or more gases for one or more sites. The inputs mod_data, utc_offsets, and site_abbrevs determine the number of sites. Each of these must be either a single instance of the correct type or a collection of those types. Any of them given as collections must have the same number of elements; those given as single instances will be used for all profiles.

For example, say that you wanted to generate profiles for three days. mod_data would need to be a list of paths to .mod files, but utc_offsets and site_abbrevs could be a single timedelta and string, respectively.

species can likewise be a single instance or a collection, but in either case will be applied to all sites/times. This determines which species will have profiles generated.

Parameters
  • mod_data – input to generate_single_tccon_prior(), see that function.

  • utc_offsets – input to generate_single_tccon_prior(), see that function.

  • species (str, TraceGasTropicsRecord, list(str), or list(TraceGasTropicsRecord)) – either gas names as strings or instances of TraceGasTropicsRecord that set up which gases’ profiles are created. If given as a list, then all species given will be generated for each site/time.

  • site_abbrevs – input to generate_single_tccon_prior(), see that function.

  • write_vmrs – if False, then .vmr files are not written. If truthy, then it must be a path to the directory where the .map files are to be written.

  • gas_name_order (list(str)) – the order that the gases are to be written in in the output files. Currently only affects the .vmr files. See mod_utils.write_vmr_file() for more information.

  • keep_latlon_prec (bool) – if False, then .vmr files written are named with lat/lon rounded to integers. If True, then 2 decimal places are retained.

  • special_header_info – A dictionary giving extra lines to write in the header of the .vmr file. The pairs will be written as “key: value” in the header.

  • prior_kwargs – additional keyword arguments passed on to generate_single_tccon_priors.

Returns

a list of dataframes containing the trace gas profiles for each requested profile.

Return type

Sequence[pandas.DataFrame]

ginput.priors.tccon_priors.get_clams_age(theta, eq_lat, day_of_year, as_timedelta=False, clams_dat={})

Get the age of air predicted by the CLAMS model for points defined by potential temperature and equivalent latitude.

Parameters
  • theta (numpy.ndarray) – a vector of potential temperatures, must be the same length as eq_lat

  • eq_lat (numpy.ndarray) – a vector of equivalent latitudes, must be the same length as theta

  • day_of_year (int) – which day of the year (e.g. Feb 1 = 32) to look up the age for

  • as_timedelta (bool) – set this to True to return the ages as relativedelta instances. When False (default) just returned in fractional years.

  • clams_dat (dict) – a dictionary containing the CLAMS data with keys ‘eqlat’ (l-element vector), ‘theta’ (m-element vector), ‘doy’ (n-element vector), and ‘age’ (l-by-m-by-n array). This can be passed manually if you want to use a custom map of age of air vs. equivalent latitude and theta, but by default will be read in from the CLAMS file provided by Arlyn Andrews and cached.

Returns

a vector of ages the same length as theta and eq_lat. The contents of the vector depend on the value of as_timedelta.

Return type

numpy.ndarray

ginput.priors.tccon_priors.get_trop_eq_lat(prof_theta, p_levels, obs_lat, obs_date, theta_wt=1.0, lat_wt=1.0, dtheta_cutoff=0.25, _theta_v_lat={})

Compute the tropospheric equivalent latitude for an observation based on its mid-tropospheric potential temperature

The rationale for using this approach is described in the module help for backend_analysis/geos_theta_lat.py. This function relies on a climatology created by that module, which should contain the zonal mean relationship between mid-tropospheric potential temperature and latitude at 2 week intervals.

This function finds the equivalent latitude for an observation by looking for the point in the same hemisphere that has the closest mid-tropospheric potential temperature in the climatology as does the profile given as input. Exactly what is defined as mid-troposphere is set by the pressure range in the climatology file, currently it is 700-500 hPa.

This function checks both north and south of the observation latitude for the climatology latitude with the closest potential temperature. As long as one is sufficiently closer to the observation’s potential temperature, that one is chosen directly. If the two are within the limit set by dtheta_cutoff, then a more careful check is necessary. The limit is defined as:

\[|(el_s - l) - (el_n - l)| < d heta\]

where \(el_s\) and \(el_n\) are the southern and northern latitudes in the climatology with the closest potential temperature to the observations, \(l\) is the observation latitude, and \(d heta\) is dtheta_cutoff. If this condition is met, then rather than just choosing whichever one has the closer potential temperature, the algorithm uses a cost function:

where \(w_t\) and \(w_l\) are the weights for potential temperature (theta_wt) and latitude (lat_wt) respectively, and \(d heta\) and \(dl\) are the difference in potential temperature and latitude, respectively, between the observation and the point chosen on the climatology curve.

The goal of this approach is to deal with two cases:

  1. when the theta vs. lat curve from the climatology is monotonically increasing or decreasing

  2. when the curve has a minimum or maximum

For #1, consider a case where theta decreases with latitude, and the observation’s theta is greater than the climatological theta for that latitude. Then going south will match the theta much better, so the cutoff condition is not met, and we automatically choose the southern point.

For #2, consider again a case where the observation’s theta is greater that climatological theta for that latitude, but now the climatological curve has a minimum just north of the observation. In that case, we may find two equally good matches for the observation’s theta, so, in the absence of other information, we choose the nearer one. This is admittedly a simplification - it is entirely possible that the actual synoptic transport carried air from the further position, but without a second tracer to differentiate that in the meteorology data, or information on prevailing north/south transport for a given lat/lon, the best assumption is to favor shorter transport.

Parameters
  • prof_theta (numpy.ndarray) – the profile of potential temperature values associated with this observation

  • p_levels (numpy.ndarray) – the profile of pressure levels that prof_theta is defined on

  • obs_lat (float) – the geographic latitude of the observation

  • obs_date (datetime-lik) – the date of the observation

  • theta_wt (float) – a weight to use when deciding between two different latitudes with similar theta values. Increasing this relative to lat_wt will increase the cost for choosing the point with a greater difference in potential temperature.

  • lat_wt (float) – similar to theta_wt, but increasing this prefers the point closer in latitude.

  • dtheta_cutoff (float) – how close the two (north and south) differences between the climatology and observed mid-troposphere potential temperature have to be to take into account which one is closer. See above.

  • _theta_v_lat – not intended to pass in; this is a dictionary that will be given the values read in from the climatology file to cache them for future function calls.

Returns

the equivalent latitude derived from mid-tropospheric potential temperature

Return type

float

ginput.priors.tccon_priors.modify_strat_co(base_co_profile, pres_profile, eqlat_profile, pt_profile, trop_pres, prof_date, model_transition_pressures=(30.0, 10.0), excess_co_lut='/home/docs/checkouts/readthedocs.org/user_builds/ginput/checkouts/latest/ginput/data/meso_co_lut.nc', keep_orig_nans=False)

Takes the baseline GEOS CO profile and adds the mesospheric contribution to the stratosphere.

The GEOS FP-IT product includes CO profiles, however it does not account for CO drawn down from the mesosphere into the stratosphere. This function adds an estimated contribution of mesospheric CO, derived from ACE-FTS data.

Parameters
  • base_co_profile (array-like) – the baseline CO profile in ppb.

  • pres_profile (array-like) – the pressure levels that the CO profile is defined on.

  • eqlat_profile (arrary-like) – the equivalent latitude profile, in degrees north.

  • pt_profile (array-like) – the potential temperature profile on the same levels as the CO profile, in Kelvin.

  • trop_pres (float) – the tropopause pressure, in the same units as the pressure profile.

  • prof_date (datetime-like) – the date of the profile

  • excess_co_lut (str) – the path to the lookup table with the excess mesospheric CO. If not given, the standard table file included in the repo is used. This file is created with ginput.priors.backend_analysis.ace_fts_analysis.make_excess_co_lut().

Returns

the CO profile with the CO from mesospheric descent added.

Return type

array-like