Convolutional Neural Network Classification Methods
Outstanding success of Convolutional Neural Network image classification in the last few years influenced application of this technique to extensive variety of objects. In particular, deep learning techniques became very powerful after evolutionary model AlexNet was created in 2012 to improve the results of ImageNet challenge. Since the introduction of large-scale visual datasets like ImageNet, most success in computer vision has been primarily driven by supervised learning.CNN image classification methods are getting high accuracies but being based on supervised machine learning, they require labeling of huge volumes of data. One of the solution of this challenge is transfer learning. Fine-tuning a network with transfer learning usually works much faster and has higher accuracy than training CNN image classification models from scratch.
In this post we examine how to apply CNN image classification transfer learning methods to climate data analysis.
CNN Classification of Embedded Vectors
In this post we will use a deep learning technique that we learned in fast.ai 'Practical Deep Learning for Coders, v3' class and fast.ai forum 'Time series/ sequential data' study group.
In our previous posts we employed this technique to Natural Language Processing - "Free Associations - Find Unexpected Word Pairs via Convolutional Neural Network" and "Word2Vec2Graph to Images to Deep Learning." and to electroencephalography data analysis: "EEG Patterns by Deep Learning and Graph Mining."
CNN Classification Method
For classification method we will convert yearly daily temperature datas to images using Gramian Angular Field (GASF) - a polar coordinate transformation. This method is well described by Ignacio Oguiza in Fast.ai forum 'Time series classification: General Transfer Learning with Convolutional Neural Networks'. He referenced to paper Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks. For data processing we will use ideas and code from Ignacio Oguiza code is in his GitHub notebook Time series - Olive oil country.
Data Preprocessing
Data Source
To demonstrate how this methods work we will use climate data from kaggle.com data sets: "Temperature History of 1000 cities 1980 to 2020".This data has average daily temperature in Celsius degrees for years from January 1, 1980 to September 30, 2020 for 1000 most populous cities in the world.
Transform Raw Data to Daily Temperature by Year Vectors
The raw data of average daily temperature for 1000 cities is represented in 1001 columns - city metadata and average temperature rows for all dates from 1980, January 1 to September 30, 2020. To experiment with classification method we'll calculate 'zone' metadata as Tropical, North and South regions based on latitude: As city metadata we will use the following columns:- City
- Country
- Latitude
- Longitude
- Zone
- To get the same data format for each time series from raw data we excluded February 29 rows
- As we had data only until September 30, 2020, we excluded data for year 2020
- From dates formated as 'mm/dd/yyyy' strings we extracted year as 'yyyy' strings
- Metadata columns: city, latitude, longitude, country, zone, year
- 365 columns with average daily temperatures
Prepare Training Data by Zone
We will classify daily temperature time series by North, South and Tropical zones. The distribution of populated cities is not proportional by zones: about one third of cities are located in Tropical region, much more on Northern region and much less on Southern region:- Northern region: 614 cities
- Tropical region: 340 cities
- Sourthern region: 46 cities
Data Prearation for Southern Region
We will describe in detail data preparation process for Southern region. For the other two regegions data preparation processes are very similar. As in raw data we have only 46 cities in Southern region, we will use all this data for training. Here are the following steps:- Transforn temperature values from string format to float format
- Get a subset of data for Southern region
- Split data to metadata and data values
- Transform data values to numpy format
- Transform daily temperature float arrays to pictures
Tropical and North Regions
For Tropical and Northern regions we will shuffle data and select about 2000 rows. Southern region example: plot and GASF images for daily weather data in Singapore in 1999. Northern region example: plot and GASF images for daily weather in Moscow in 2013.Immage Classification
Training Data Preparation
For time series classification model we used transfer learning approach from fast.ai library. Here is code to prepare data for model training and and show some Southern region data examples:Model Training
Code based on fast.ai library to train the time series classification model and save the results:
Interpretation of CNN Classification Model
CNN Model Accuracy Statistics
The error rate of the model is 6.8% and the arccuracy about 93.8%. To calculate accuracy statistics we'll read the data and run it through the model and split the results by zone: For each table we calculated statistics:Then we combined resuls by taking north region probability ('predNorth' column) for Northern region statistics, tropic region probability for Tropical region statistics and south region probability for Southern region.
Here are max, min and mean image examples for Northern, Southern and Tropical regions: