Tutorial for the conversion of an Excel sheet into Station Exchange Format using Python


Introduction

This page illustrates how to create a SEF file from climate data digitized in an Excel sheet. We will make use of the functions provided within the Python package pandas (version 0.23.4). Other packages are also used (all can be installed with the command sudo apt-get install packagename). The code described in this page can be downloaded here.

Excel file

The Excel file (example.xls) that we want to convert into SEF looks like this:

It contains daily observations of temperature (in Fahrenheit) for Detroit, MI (coordinates: 42°19'51.6"N 83°02'45.6"W) for the year 1781 made by George Christian Anthon.

Step 1: Read the Excel file into Python

There are many ways to read an Excel file into Python. In this example we are using pandas to firstly create a Dataframe dfs. After that all rows from Dataframe will be iterabled to create a list of dictionaries Python.

import pandas as pd

def import_excel(input_file_path):
"""Import the Excel file as input file

Args:
	:param input_file_path: Path of input file

Returns:
	records (:obj:'list'): List of dict
"""
	
    dfs = pd.read_excel(input_file_path, skiprows=1)
    dfs = dfs.fillna('')
    records = [dict(row[1]) for row in dfs.iterrows()] # iterrows to create a list of dictionaries
    record = []
    for record_id in records:
        record.append(record_id)

    return records

Please, note that pandas read the columns 07:00:00 AM, 12:00:00 PM and 20:00:00 PM as datetime.time(7, 0), datetime.time(12, 0) and datetime.time(20, 0), respectively. This will create a list that looks like this:

[
{'Year': 1781, 'Month': 8, 'Day': 1, datetime.time(7, 0): 70.0, datetime.time(12, 0): 79.0, datetime.time(20, 0): 73.0},
{'Year': 1781, 'Month': 8, 'Day': 2, datetime.time(7, 0): 75.0, datetime.time(12, 0): 84.0, datetime.time(20, 0): 77.0},
{'Year': 1781, 'Month': 8, 'Day': 3, datetime.time(7, 0): 74.0, datetime.time(12, 0): 76.0, datetime.time(20, 0): 75.0},
]

Note that empty cells are read as NA (missing value).

Step 2: Prepare the metadata

Metadata are as important as the data themselves. Some metadata are required in SEF files, such as station name, geographical coordinates, units, variable and statistic code.

Coordinates in particular is where mistakes are introduced more often during data manipulation. The most common mistake arises from the conversion from degrees-minutes-seconds to decimal format. SEF files require the coordinates to be in decimal format (and the altitude in meters). We create three variables that contain the coordinates in the correct format:

latitude = round(42 + 19 / 60 + 51.6 / 3600, 4) 
longitude = round(-(83 + 2 / 60 + 45.6 / 3600), 4)
altitude = 'NA'

Step 3: Prepare the SEF file header

Before to prepare the header, we need to define the SEF version:

version = '1.0.0'

iversion = [int(x) for x in version.split('.')]
if iversion[1] > 0 or iversion[2] > 0:
	raise IOError("SEF versions > 0.0 are not supported")

The header of the file gets the initial information about SEF file. The header is a dictionary with some keys and value corresponding. This will be a dictionary that looks like this:

header = {
        'SEF': version, 'ID': 'Detroit_Anthon', 'Name': 'Detroit, MI',
        'Lat': latitude, 'Lon': longitude, 'Alt': altitude, 'Source': 'C3S-DRS',
        'Link': '', 'Vbl': 'ta', 'Stat': 'point',
        'Units': 'C', 'Meta': 'Observer=George Christian Anthon',
    }
						

Note that this is the main dictionary where the temp_dict variable will be appended to it (Step 6).

Step 4: Add time and date

We need to convert the observation time to UTC time. So we use the following code to convert it and then set the format YYYY-MM-DD HH:MM:SS for the date_time variable.

time_offset = longitude * 12 / 180
date = str(records[index]['Year']) \ + "-" + str(records[index]['Month']) \ + "-" + str(records[index]['Day']) \ + " " + str(times[index_times])
date_time = datetime.datetime.strptime(date, '%Y-%m-%d %H:%M:%S')
utc = date_time - datetime.timedelta(hours=time_offset) //input: 1781-8-1 07:00:00 //output: 1781-08-01 12:32:11

The utc variable is datetime.datetime object. To get the year, month, day, hour and minutes variables we need to convert it to str:

year = str(utc)[:4]
month = str(utc)[5:7]
day = str(utc)[8:10]
hour = str(utc)[11:13]
minutes = str(utc)[14:16]

Step 5: Convert Fahrenheit to Celsius

It is recommended to convert the data into metric units. This can be done easily with the following commands:

temperatures = []
if isinstance(temperatures[index_temperatures], str): value = 'NA' else: value = round(((float(temperatures[index_temperatures]) - 32) * 5 / 9), 1)

Some values can be NA, so we must to verify if the value exists before to perform the conversion.

Step 6: Write SEF files

The two following for loop will create the lines of Meta table:

    
    times = [datetime.time(7, 0), datetime.time(12, 0), datetime.time(20, 0)]
    original_time = ["7:00AM", "12:00PM", "20:00PM"]

    for index in range(len(records)):
        temperatures.append(records[index][datetime.time(7, 0)])
        temperatures.append(records[index][datetime.time(12, 0)])
        temperatures.append(records[index][datetime.time(20, 0)])
        for time in original_time:
            if isinstance(temperatures[index_temperatures], str):
                value = 'NA'
            else:
                value = round(((float(temperatures[index_temperatures]) - 32) * 5 / 9), 1)

            date = str(records[index]['Year']) \
                + "-" + str(records[index]['Month']) \
                + "-" + str(records[index]['Day']) \
                + " " + str(times[index_times])

            date_time = datetime.datetime.strptime(date, '%Y-%m-%d %H:%M:%S')

            utc = date_time - datetime.timedelta(hours=time_offset)

            year = str(utc)[:4]
            month = str(utc)[5:7]
            day = str(utc)[8:10]
            hour = str(utc)[11:13]
            minutes = str(utc)[14:16]

            data_dict = {
                'Data': pd.DataFrame({
                    'Year': year,
                    'Month': month,
                    'Day': day,
                    'Hour': hour,
                    'Minute': minutes,
                    'Period': 0,
                    'Value': value,
                    'Meta': "orig=" + str(temperatures[index_temperatures])
                            + 'F' + "|orig.time=" + str(time)
                            + "|orig.date=" + str(records[index]['Year']) + '-' + str(records[index]['Month'])
                            + '-' + str(records[index]['Day'])

                }, index=[0])
            }
            temp_dict['Data'].append(data_dict['Data'])

            index_times += 1
            if index_times > 2:
                index_times = 0

            index_temperatures += 1

        header.update(temp_dict)

Step 6.1: The write_file function

The header will be processed by this function to be written to the output file in which the table's columns are defined:

def write_file(obs, file_name):
    """Write the specified set of obs to a file in SEF format.

    Args:
        obs (:obj: 'dict'): Dictionary
        file_name (:obj:'str'): File (or 'open'able object)

    Returns:
        :obj:'dict': Data as key:value pairs.

    Raises:
        ValueError: obs not a SEF structure
    """
    try:
        version = obs['SEF']
        iversion = [int(x) for x in version.split('.')]

        if iversion[1] > 0 or iversion[2] > 0:
            raise IOError("SEF versions > 0.0 are not supported")
    except:
        raise ValueError("This does not look like a SEF data structure")

    # Operate on local copy
    obs = copy.deepcopy(obs)
    f = codecs.open(file_name, 'w', encoding='utf-8')
    # Meta might need packing
    obs['Meta'] = _pack(obs['Meta'])
    # Header first
    for header in ('SEF', 'ID', 'Name', 'Source', 'Lat', 'Lon', 'Alt', 'Link', 'Vbl',
                   'Stat', 'Units', 'Meta'):
        if obs[header] is not None and obs[header] == obs[header]:

            f.write("%s\t%s\n" % (header, obs[header]))
        else:
            f.write("%s\t\n" % header)

    obs['Data'][0].to_csv(f, sep='\t', header=True, index=False)

    for i in range(1, len(obs['Data'])):
        obs['Data'][i].to_csv(f, sep='\t', header=False, index=False,)


def _pack(m_list):
    if m_list is None:
        return m_list
    elif isinstance(m_list, str):
        return m_list
    else:
        return ','.join(m_list)
				

The resulting SEF files should look like this: