Multimodal-Emotion-Recognition/01-Audio/Notebook/SVM/01 - Preprocessing [SVM].ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Speech Emotion Recognition - Signal Preprocessing\n",
    "\n",
    "A project for the French Employment Agency\n",
    "\n",
    "Telecom ParisTech 2018-2019"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## I. Context"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The aim of this notebook is to set up all speech emotion recognition preprocessing and audio features extraction.\n",
    "\n",
    "### Audio features:\n",
    "The complete list of the implemented short-term features is presented below:\n",
    "- **Zero Crossing Rate**: The rate of sign-changes of the signal during the duration of a particular frame.\n",
    "- **Energy**: The sum of squares of the signal values, normalized by the respective frame length.\n",
    "- **Entropy of Energy**: The entropy of sub-frames' normalized energies. It can be interpreted as a measure of abrupt changes.\n",
    "- **Spectral Centroid**: The center of gravity of the spectrum.\n",
    "- **Sprectral Spread**: The second central moment of the spectrum.\n",
    "- **Spectral Entropy**: Entropy of the normalized spectral energies for a set of sub-frames.\n",
    "- **Spectral Flux**: The squared difference between the normalized magnitudes of the spectra of the two successive frames.\n",
    "- **Spectral Rolloff**: The frequency below which 90% of the magnitude distribution of the spectrum is concentrated.\n",
    "- **MFCCS**: Mel Frequency Cepstral Coefficients form a cepstral representation where the frequency bands are not linear but distributed according to the mel-scale.\n",
    "\n",
    "Global Statistics are then computed on upper features:\n",
    "- **mean, std, med, kurt, skew, q1, q99, min, max and range**\n",
    "\n",
    "### Data:\n",
    "**RAVDESS**: The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes *calm*, *happy*, *sad*, *angry*, *fearful*, *surprise*, and *disgust* expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. (https://zenodo.org/record/1188976#.XA48aC17Q1J)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## II. General import"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-04-15T13:13:31.470677Z",
     "start_time": "2019-04-15T13:13:30.911103Z"
    }
   },
   "outputs": [],
   "source": [
    "### General imports ###\n",
    "from glob import glob\n",
    "import os\n",
    "import pickle\n",
    "import itertools\n",
    "import numpy as np\n",
    "\n",
    "### Audio preprocessing imports ###\n",
    "from AudioLibrary.AudioSignal import *\n",
    "from AudioLibrary.AudioFeatures import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-12-04T16:38:44.580314Z",
     "start_time": "2018-12-04T16:38:44.560062Z"
    }
   },
   "source": [
    "## III. Set labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-04-15T13:13:31.477659Z",
     "start_time": "2019-04-15T13:13:31.473279Z"
    }
   },
   "outputs": [],
   "source": [
    "# RAVDESS Database\n",
    "label_dict_ravdess = {'02': 'NEU', '03':'HAP', '04':'SAD', '05':'ANG', '06':'FEA', '07':'DIS', '08':'SUR'}\n",
    "\n",
    "# Set audio files labels\n",
    "def set_label_ravdess(audio_file, gender_differentiation):\n",
    "    label = label_dict_ravdess.get(audio_file[6:-16])\n",
    "    if gender_differentiation == True:\n",
    "        if int(audio_file[18:-4])%2 == 0: # Female\n",
    "            label = 'f_' + label\n",
    "        if int(audio_file[18:-4])%2 == 1: # Male\n",
    "            label = 'm_' + label\n",
    "    return label"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## IV. Import audio files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-04-15T13:13:36.852703Z",
     "start_time": "2019-04-15T13:13:31.479656Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Import Data: START\n",
      "Import Data: RUNNING ... 0 files\n",
      "Import Data: RUNNING ... 200 files\n",
      "Import Data: RUNNING ... 300 files\n",
      "Import Data: RUNNING ... 400 files\n",
      "Import Data: RUNNING ... 500 files\n",
      "Import Data: RUNNING ... 600 files\n",
      "Import Data: RUNNING ... 700 files\n",
      "Import Data: RUNNING ... 800 files\n",
      "Import Data: RUNNING ... 900 files\n",
      "Import Data: RUNNING ... 1000 files\n",
      "Import Data: RUNNING ... 1100 files\n",
      "Import Data: RUNNING ... 1200 files\n",
      "Import Data: RUNNING ... 1300 files\n",
      "Import Data: RUNNING ... 1400 files\n",
      "Import Data: END \n",
      "\n",
      "Number of audio files imported: 1344\n"
     ]
    }
   ],
   "source": [
    "# Start feature extraction\n",
    "print(\"Import Data: START\")\n",
    "\n",
    "# Audio file path and names\n",
    "file_path = '../Datas/RAVDESS/'\n",
    "file_names = os.listdir(file_path)\n",
    "\n",
    "# Initialize signal and labels list\n",
    "signal = []\n",
    "labels = []\n",
    "\n",
    "# Sample rate (44.1 kHz)\n",
    "sample_rate = 44100     \n",
    "\n",
    "# Compute global statistics features for all audio file\n",
    "for audio_index, audio_file in enumerate(file_names):\n",
    "\n",
    "    # Select audio file\n",
    "    if audio_file[6:-16] in label_dict_ravdess.keys():\n",
    "        \n",
    "        # Read audio file\n",
    "        signal.append(AudioSignal(sample_rate, filename=file_path + audio_file))\n",
    "        \n",
    "        # Set label\n",
    "        labels.append(set_label_ravdess(audio_file, True))\n",
    "\n",
    "        # Print running...\n",
    "        if (audio_index % 100 == 0):\n",
    "            print(\"Import Data: RUNNING ... {} files\".format(audio_index))\n",
    "        \n",
    "# Cast labels to array\n",
    "labels = np.asarray(labels).ravel()\n",
    "\n",
    "# Stop feature extraction\n",
    "print(\"Import Data: END \\n\")\n",
    "print(\"Number of audio files imported: {}\".format(labels.shape[0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## V. Audio features extraction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-04-15T13:13:36.863481Z",
     "start_time": "2019-04-15T13:13:36.855871Z"
    }
   },
   "outputs": [],
   "source": [
    "# Audio features extraction function\n",
    "def global_feature_statistics(y, win_size=0.025, win_step=0.01, nb_mfcc=12, mel_filter=40,\n",
    "                             stats = ['mean', 'std', 'med', 'kurt', 'skew', 'q1', 'q99', 'min', 'max', 'range'],\n",
    "                             features_list =  ['zcr', 'energy', 'energy_entropy', 'spectral_centroid', 'spectral_spread', 'spectral_entropy', 'spectral_flux', 'sprectral_rolloff', 'mfcc']):\n",
    "    \n",
    "    # Extract features\n",
    "    audio_features = AudioFeatures(y, win_size, win_step)\n",
    "    features, features_names = audio_features.global_feature_extraction(stats=stats, features_list=features_list)\n",
    "    return features\n",
    "    \n",
    "# Features extraction parameters\n",
    "sample_rate = 16000 # Sample rate (16.0 kHz)\n",
    "win_size = 0.025    # Short term window size (25 msec)\n",
    "win_step = 0.01     # Short term window step (10 msec)\n",
    "nb_mfcc = 12        # Number of MFCCs coefficients (12)\n",
    "nb_filter = 40      # Number of filter banks (40)\n",
    "stats = ['mean', 'std', 'med', 'kurt', 'skew', 'q1', 'q99', 'min', 'max', 'range'] # Global statistics\n",
    "features_list =  ['zcr', 'energy', 'energy_entropy', 'spectral_centroid', 'spectral_spread', # Audio features\n",
    "                      'spectral_entropy', 'spectral_flux', 'sprectral_rolloff', 'mfcc']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-04-15T13:19:38.974213Z",
     "start_time": "2019-04-15T13:13:36.866069Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Feature extraction: START\n",
      "Feature extraction: END!\n"
     ]
    }
   ],
   "source": [
    "# Start feature extraction\n",
    "print(\"Feature extraction: START\")\n",
    "\n",
    "# Compute global feature statistics for all audio file\n",
    "features = np.asarray(list(map(global_feature_statistics, signal)))\n",
    "\n",
    "# Stop feature extraction\n",
    "print(\"Feature extraction: END!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## VI. Save as"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-04-15T13:19:38.983530Z",
     "start_time": "2019-04-15T13:19:38.975722Z"
    }
   },
   "outputs": [],
   "source": [
    "# Save DataFrame to pickle\n",
    "pickle.dump([features, labels], open(\"../Datas/Pickle/[RAVDESS][HAP-SAD-NEU-ANG-FEA-DIS-SUR][GLOBAL_STATS].p\", 'wb'))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}