This page illustrates how to create a SEF file from climate data digitized in an Excel sheet. We will make use of the functions provided within the Python package pandas
(version 0.23.4). Other packages are also used (all can be installed with the command sudo apt-get install packagename
).
The code described in this page can be downloaded here.
The Excel file (example.xls) that we want to convert into SEF looks like this:
It contains daily observations of temperature (in Fahrenheit) for Detroit, MI (coordinates: 42°19'51.6"N 83°02'45.6"W) for the year 1781 made by George Christian Anthon.
There are many ways to read an Excel file into Python. In this example we are using pandas
to firstly create a Dataframe dfs
. After that all rows from Dataframe will be iterabled to create a list of dictionaries Python.
import pandas as pd def import_excel(input_file_path): """Import the Excel file as input file Args: :param input_file_path: Path of input file Returns: records (:obj:'list'): List of dict """ dfs = pd.read_excel(input_file_path, skiprows=1) dfs = dfs.fillna('') records = [dict(row[1]) for row in dfs.iterrows()] # iterrows to create a list of dictionaries record = [] for record_id in records: record.append(record_id) return records
Please, note that pandas
read the columns 07:00:00 AM, 12:00:00 PM and 20:00:00 PM as datetime.time(7, 0), datetime.time(12, 0) and datetime.time(20, 0), respectively. This will create a list that looks like this:
[
{'Year': 1781, 'Month': 8, 'Day': 1, datetime.time(7, 0): 70.0, datetime.time(12, 0): 79.0, datetime.time(20, 0): 73.0},
{'Year': 1781, 'Month': 8, 'Day': 2, datetime.time(7, 0): 75.0, datetime.time(12, 0): 84.0, datetime.time(20, 0): 77.0},
{'Year': 1781, 'Month': 8, 'Day': 3, datetime.time(7, 0): 74.0, datetime.time(12, 0): 76.0, datetime.time(20, 0): 75.0},
]
Note that empty cells are read as NA
(missing value).
Metadata are as important as the data themselves. Some metadata are required in SEF files, such as station name, geographical coordinates, units, variable and statistic code.
Coordinates in particular is where mistakes are introduced more often during data manipulation. The most common mistake arises from the conversion from degrees-minutes-seconds to decimal format. SEF files require the coordinates to be in decimal format (and the altitude in meters). We create three variables that contain the coordinates in the correct format:
latitude = round(42 + 19 / 60 + 51.6 / 3600, 4)
longitude = round(-(83 + 2 / 60 + 45.6 / 3600), 4)
altitude = 'NA'
Before to prepare the header, we need to define the SEF version:
version = '1.0.0' iversion = [int(x) for x in version.split('.')] if iversion[1] > 0 or iversion[2] > 0: raise IOError("SEF versions > 0.0 are not supported")
The header of the file gets the initial information about SEF file. The header
is a dictionary with some keys and value corresponding. This will be a dictionary that looks like this:
header = { 'SEF': version, 'ID': 'Detroit_Anthon', 'Name': 'Detroit, MI', 'Lat': latitude, 'Lon': longitude, 'Alt': altitude, 'Source': 'C3S-DRS', 'Link': '', 'Vbl': 'ta', 'Stat': 'point', 'Units': 'C', 'Meta': 'Observer=George Christian Anthon', }
Note that this is the main dictionary where the temp_dict
variable will be appended to it (Step 6).
We need to convert the observation time to UTC time. So we use the following code to convert it and then set the format YYYY-MM-DD HH:MM:SS
for the date_time
variable.
time_offset = longitude * 12 / 180
date = str(records[index]['Year']) \ + "-" + str(records[index]['Month']) \ + "-" + str(records[index]['Day']) \ + " " + str(times[index_times])
date_time = datetime.datetime.strptime(date, '%Y-%m-%d %H:%M:%S')
utc = date_time - datetime.timedelta(hours=time_offset) //input: 1781-8-1 07:00:00 //output: 1781-08-01 12:32:11
The utc
variable is datetime.datetime
object. To get the year
, month
, day
, hour
and minutes
variables we need to convert it to str
:
year = str(utc)[:4] month = str(utc)[5:7] day = str(utc)[8:10] hour = str(utc)[11:13] minutes = str(utc)[14:16]
It is recommended to convert the data into metric units. This can be done easily with the following commands:
temperatures = []
if isinstance(temperatures[index_temperatures], str): value = 'NA' else: value = round(((float(temperatures[index_temperatures]) - 32) * 5 / 9), 1)
Some values can be NA
, so we must to verify if the value exists before to perform the conversion.
The two following for
loop will create the lines of Meta table:
times = [datetime.time(7, 0), datetime.time(12, 0), datetime.time(20, 0)] original_time = ["7:00AM", "12:00PM", "20:00PM"] for index in range(len(records)): temperatures.append(records[index][datetime.time(7, 0)]) temperatures.append(records[index][datetime.time(12, 0)]) temperatures.append(records[index][datetime.time(20, 0)]) for time in original_time: if isinstance(temperatures[index_temperatures], str): value = 'NA' else: value = round(((float(temperatures[index_temperatures]) - 32) * 5 / 9), 1) date = str(records[index]['Year']) \ + "-" + str(records[index]['Month']) \ + "-" + str(records[index]['Day']) \ + " " + str(times[index_times]) date_time = datetime.datetime.strptime(date, '%Y-%m-%d %H:%M:%S') utc = date_time - datetime.timedelta(hours=time_offset) year = str(utc)[:4] month = str(utc)[5:7] day = str(utc)[8:10] hour = str(utc)[11:13] minutes = str(utc)[14:16] data_dict = { 'Data': pd.DataFrame({ 'Year': year, 'Month': month, 'Day': day, 'Hour': hour, 'Minute': minutes, 'Period': 0, 'Value': value, 'Meta': "orig=" + str(temperatures[index_temperatures]) + 'F' + "|orig.time=" + str(time) + "|orig.date=" + str(records[index]['Year']) + '-' + str(records[index]['Month']) + '-' + str(records[index]['Day']) }, index=[0]) } temp_dict['Data'].append(data_dict['Data']) index_times += 1 if index_times > 2: index_times = 0 index_temperatures += 1 header.update(temp_dict)
The header
will be processed by this function to be written to the output file in which the table's columns are defined:
def write_file(obs, file_name): """Write the specified set of obs to a file in SEF format. Args: obs (:obj: 'dict'): Dictionary file_name (:obj:'str'): File (or 'open'able object) Returns: :obj:'dict': Data as key:value pairs. Raises: ValueError: obs not a SEF structure """ try: version = obs['SEF'] iversion = [int(x) for x in version.split('.')] if iversion[1] > 0 or iversion[2] > 0: raise IOError("SEF versions > 0.0 are not supported") except: raise ValueError("This does not look like a SEF data structure") # Operate on local copy obs = copy.deepcopy(obs) f = codecs.open(file_name, 'w', encoding='utf-8') # Meta might need packing obs['Meta'] = _pack(obs['Meta']) # Header first for header in ('SEF', 'ID', 'Name', 'Source', 'Lat', 'Lon', 'Alt', 'Link', 'Vbl', 'Stat', 'Units', 'Meta'): if obs[header] is not None and obs[header] == obs[header]: f.write("%s\t%s\n" % (header, obs[header])) else: f.write("%s\t\n" % header) obs['Data'][0].to_csv(f, sep='\t', header=True, index=False) for i in range(1, len(obs['Data'])): obs['Data'][i].to_csv(f, sep='\t', header=False, index=False,) def _pack(m_list): if m_list is None: return m_list elif isinstance(m_list, str): return m_list else: return ','.join(m_list)
The resulting SEF files should look like this: