Classifying Thought Processes via Brain-Computer Interface using Machine Learning Concept Design Review Board Report
Group Members:
Project Advisor:
Hadi Murtuza (EE 1051)
Dr. Asim Khwaja
Daniyal Khawaja (EE 1053) Usman Shahid (EE 1057)
Co-Advisor:
Rahim Rasool (EE 1058)
Dr. Tariq Mairaj
Pakistan Navy Engineering College National University of Sciences and Technology Department of Electronics and Power Engineering
Table of Contents Contents. Table of Contents....................................................................................................................................1 Problem Statement...................................................................................................................................2 Project Objective and Expected Outcome.............................................................................................3 Dimensionality Reduction........................................................................................................................4 Background:..............................................................................................................................................4 t-Distributed Stochastic Neighbor Embedding:.................................................................................5 Principal Component Analysis:............................................................................................................7 Headset......................................................................................................................................................9 Headset Placement.............................................................................................................................10 Signal Quality:......................................................................................................................................11 Connectivity and Transmission:.........................................................................................................11 Emotiv Pro:...........................................................................................................................................12 Modular Division:....................................................................................................................................14 Data Acquisition:.....................................................................................................................................14 Feature Extraction:.................................................................................................................................14 Algorithm Design and Plotting:..........................................................................................................14 Project Workflow.....................................................................................................................................15 Risk Analysis...........................................................................................................................................16 Possible Applications:............................................................................................................................17
1
Problem Statement A Brain to Computer Interface or BCI is a system that provides a direct communication between a human brain and a computer. Research in BCIs was first started in the early 1970s and mainly focused on neuroprosthetics applications that aimed at restoring damaged hearing, sight and movement. With the boom in machine learning in recent years, however, interest in BCIs has shifted from neuroprosthetics to direct brain to brain communication and thought process decoding. Machine learning has found a solid application in the field of BCI, with intelligent systems being used to learn and understand raw brain signals but the field hasn’t seen much development unlike other branches of machine learning. An exciting area in the field is using brainwave data with the power of machine learning to overcome human disabilities. Considerable work has already been done on enabling communication between computers and the human brain but there still is a lot more work to be done on decoding human thought patterns. A major problem that can be solved by this technology is speech disability. The problem, faced by a huge percentage of the world’s population, can be tackled by combining BCI with machine learning, resulting in a computer system that is able to read a person’s brainwave via an EEG headset, is able to pick out the strongest thought, classify it to deduce its meaning and eventually print the word out on a screen or speak it out via a speaker.
2
Project Objective and Expected Outcome
In our Project, we aim to classify brainwaves according to the thoughts they represent. As a simple foray into the field of BCI, our project uses machine learning to classify brainwave data according to the thought process they represent; the first step towards thought process decoding. At the very basic level, our system would be trained from raw brainwave data acquired from over 200 individuals over the course of 2 months. It is to be noted that the system would be truly general purpose as the dataset would be built up by choosing subjects at random. For the project, we are using the Emotiv Epoc+ headset 1. The headset does the necessary signal detection, noise filtering, preprocessing and cleaning and then transmits the raw brainwave to the system via BLE. The headset samples EEG waves at a rate of 128 bits per second from a total of 14 different channels (as defined in the 10/20 system). Each sample is taken for a total of 10 seconds, bringing in the recorded data to a total of 17,920 data points per sample per individual. The system regards each sample as a data matrix of vectors i.e. a 1280x14 matrix where each column is a single brainwave from one channel. The inputs would then be fed into the system which would use a statistical technique called t-Distributed Stochastic Neighbor Embedding (t-SNE) to cluster the data points based on their relationship with each other and display them on a 3-dimensional graph. Once classified, the resulting test sample will then be forwarded to a speech synthesizer that would output the word that the sample represents via a speaker. Because the algorithm is being run on a varied dataset of random individuals, the graph built up will show if the system is able to clearly distinguish between similar brainwaves from different individuals. Another technique called Principal Component Analysis will be also used as a control algorithm for the system.
3
Dimensionality Reduction Background: Visualization of high-dimensional data is an important problem in many different domains, and deals with data of widely varying dimensionality. Many real world datasets are functions of hundreds to thousands of variables which makes them very hard to understand intuitively. Cell nuclei that are relevant to breast cancer, for example, are described by approximately 30 variables, whereas the pixel vectors used to represent images typically have thousands of dimensions. Dimensionality reduction methods convert high-dimensional datasets into two or threedimensional data that can be displayed in a scatterplot. The new low-dimensional data is represented as a map of individual datapoints (map points). The aim of dimensionality reduction is to preserve as much of the significant structure of the high-dimensional data as possible in the low-dimensional map. Various techniques for this problem have been proposed that differ in the type of structure they preserve. Principal Component Analysis is a common linear dimensionality reduction technique that focuses on keeping the low-dimensional representations of dissimilar datapoints far apart. For high dimensional data that lies on or near a low-dimensional, non-linear manifold it is usually more important to keep the low-dimensional representations of very similar datapoints close together, which is typically not possible with a linear mapping. t-Distributed Stochastic Neighbor Embedding or t-SNE is a non-linear dimensionality reduction technique that has been successful in accurately visualizing high-dimensional data with up to thousands of dimensions. It models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points. In our project, we analyze the effects of both PCA and t-SNE on a real world brainwave dataset and try to determine if the algorithms are able to accurately classify the brainwave dataset.
4
t-Distributed Stochastic Neighbor Embedding: t-Distributed Stochastic Neighbor Embedding (t-SNE) is a technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets (like our 1280x14 point high dimensional dataset). Based on the accurate and very popular Stochastic Neighbor Embedding (SNE) technique, t-SNE features small changes in the underlying methods, resulting in better accuracy. At the very basic level, the algorithm builds up two similarity matrices of conditional probabilities of points in high and low dimensional spaces respectively. If the map points in the low-dimensional space correctly model the similarity between the points in the high dimensional space, the conditional probabilities of both the sets will be equal. Once the similarity matrix of the points in high-dimensional space is built up, the algorithm attempts to find points in the low-dimensional space that minimize the probability error between both the matrices using gradient descent 2. t-SNE starts by converting the high-dimensional Euclidean distances between datapoints into conditional probabilities that represent similarities. The similarity of datapoint x j to datapoint x i is the conditional probability, p j∨i that x i would pick x j as its neighbor if neighbors were picked in proportion to their probability density under a Gaussian distribution centered at x i . p j∨i For nearby datapoints, is relatively high, whereas for widely separated p datapoints, will be almost infinitesimal. j∨i Mathematically, the conditional probability
‖ e
−( x i −x j )
p j∨i=
2σ
‖ ∑e
2
2 i
is given by
‖
− ( xi − x k ) 2 σi 2
p j∨i
2
‖
k≠ i
Where σ i is the value of the Gaussian distribution centered at zero, as only similarities between distinct datapoints are required.
xi .
pi∨i
is set to
At the other end, a similar similarity matrix of conditional probabilities ( qi ∨i ) can be constructed for the low-dimensional map points y i and y j .
5
Unlike SNE, t-SNE uses a t-distribution to calculate the conditional probability of the low-dimensional map points, instead of a Gaussian. This helps eliminate an optimization problem that plagues the SNE algorithm2.
Mathematically, the conditional probability given by
of the low-dimensional map points is
−( y i − y j ) ‖
p j∨i=
e‖
q j∨i
2
∑ e‖ (
− y i− y k ) ‖ 2
k≠ i
If map points y i and y j correctly model the similarity between the original datapoints x i and x j , the conditional probabilities of both the sets will be equal i.e. q j∨i= p j∨i . t-SNE aims to find the values of y i and y j that minimize the error between q j∨i∧ p j∨i
t-SNE provides more accurate results than Principal Component Analysis and other linear dimensionality reduction algorithms over a variety of datasets. This is mainly because linear algorithms are generally not good at modelling curved manifolds as the ones built up by the conditional probabilities in the high-dimensional space. Unlike linear algorithms, t-SNE focuses on preserving the distances between widely separated datapoints rather than nearby datapoints thus retaining the local structure in the lowdimensional map. We will be, however, used Principal Component Analysis in our project as a control algorithm to compare results with.
An implementation of t-SNE on the MNIST dataset3 is shown. The MNIST Dataset consists of 6000 images of hand written digits between 0 and 9. t-SNE is able to learn from the dataset and accurately map each image near its counterparts.
6
Figure 1 t-SNE's result of the MNIST dataset
Principal Component Analysis: Principal Component Analysis (PCA) is a statistical technique invented in the early 1900s as an extension of the principal axis theorem in mechanics. Due to its ability of being able to reduce the number of dimensions of a large dataset, the technique found particular use in the field of data science as a tested dimensionality reduction algorithm. At the simplest level, PCA systematically reduces dimensions in the dataset by calculating its principal components and eliminating the component with the least variation. This continues until the data is down to the required number of dimensions 4. The principal components of a dataset represent the underlying structure in the data that is not visible at higher planes. They are the directional vectors that represent where the set of datapoints has the most variance. The principal component with the least variance represents the dimension in which the data varies the least. This principal component is then removed, reducing the number of dimensions of the actual dataset by 1. Mathematically, the principal components of a dataset are given by its eigenvalues and eigenvectors. The number of eigenvalue pairs for the dataset will be equal to the number of dimensions it has4. For a dataset represented by a matrix,
A , is given by
det ( A−λ I )=0 λ ∈ Z
And its eigenvector can be calculated by
det ( A−λI ) x =0
Where
x
is the eigenvector of the matrix
A
corresponding to the eigenvalue
λ .
Even though PCA is widely used as a dimensionality reduction technique, it is unreliable because it does not preserve the underlying correlation between datapoints. The goal of dimensionality reduction is simply reducing the dimensions in a large dataset. It does not require any guarantee of the new dimensions being interpretable.
7
As a result, some information is lost after each iteration of the algorithm. For datasets with a large number of dimensions and large variations between their datapoints, this can lead to a significant loss of information in the final low-dimensional representation. This problem is avoided in t-SNE as the algorithm takes into account the correlation between each datapoint and aims to preserve both the global and the local structure of the dataset.
An implementation of PCA on the MNIST dataset is shown. The MNIST Dataset consists of 6000 images of hand written digits between 0 and 9. The algorithm is indeed able to cluster the images with a certain level of accuracy but clusters are packed together, sometimes overlap and lack the distinct classification found in t-SNE.
Figure 2 PCA Implementation of the MNIST dataset
8
Headset
The EEG headset we’ll be using for our project is the Emotiv EPOC+ 1. The EPOC+ is a 14 channel wireless EEG device, designed for contextualized research and advanced brain computer interface (BCI) applications.
It features 14 EEG channels plus 2 reference channels and offers optimal positioning for accurate spatial resolution. The device has been designed for practical research applications which are as follows: Figure 3 The Epoc+ Headser
Specifications: Number of Channels Channel Names Sampling Method Sampling Rate Resolution Bandwidth Filtering Dynamic Range Coupling Mode Connectivity Power Battery Life Impedance Measurement
EEG Headset 14 AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4 Sequential Sampling (Single ADC) 128 SPS (2048 Hz Internal) 14 bits 1 LSB = 0.51μV (16 bit ADC, 2 bits instrumental noise for discarded) 0.2 - 45Hz, digital notch filters at 50Hz and 60Hz Built in digital 5th order Sinc filter 8400μV (pp) AC coupled Proprietary wireless, 2.4GHz band LiPoly 12 hours Real-time contact quality using patented system
9
Headset Placement Correct placement of the rubber sensors is critical for correct operation. The Emotiv EPOC consists of 14 EEG channels plus 2 reference channels offering optimal positioning for accurate spatial resolution. The device uses the international 10/20 electrode location system for electrode placement5. The gold-plated, passive electrodes contact the scalp through saline-soaked felt pads. The headset also includes two accelerometers for measuring head movement. Once the headset is connected via wireless USB receiver, the headset setup panel is displayed. The main function of this panel is to display the contact quality feedback for the neuroheadset’s EEG sensors. The headset’s 14 main plus 2 reference electrodes measure the potential difference across the skull. Figure 54 Placing the Headset However, with no reference point for any the 14 electrodes, they are all paired up into groups of 2, with the difference between the pair being used to represent the raw EEG data.
Sensor Name AF3 F3 AF4 F4 F8 FC6 T8 DRL (Reference) P8 O2 O1 P7 CMS (Reference) T7 F7 FC5
Region of the Brain the Sensor is Located Over Dorsolateral Prefrontal Cortex Frontal Eye Fields Dorsolateral Prefrontal Cortex Frontal Eye Fields Anterior Prefrontal Cortex Dorsolateral Prefrontal Cortex Primary Gustatory Cortex Middle Temporal Gyrus Primary Motor Cortex Somatosensory Association Cortex Somatosensory Association Cortex Primary Motor Cortex Middle Temporal Gyrus Primary Gustatory Cortex Anterior Prefrontal Cortex Dorsolateral Prefrontal Cortex
Figure 6 Electrode positions according to the 10/20 system
10
Signal Quality: The figure down below represents the sensor locations when looking down from above on the user’s head. Each circle represents a sensor and its approximate location when wearing the headset. The application used to detect signal quality is the Emotiv Xavier Control Panel, available for free on the Emotiv Website. The color of the sensor is a representation of the contact quality. To achieve the best possible contact quality, all the sensors should show as green. Other colors indicate: Black – No Signal Red – Bad Signal Orange – Poor signal
Connectivity and Transmission: The headset communicates over BLE with the Emotiv Wireless USB receiver which can be inserted into any USB port of a computer.
11
Emotiv Pro: As mentioned earlier, the software we’ll be using to collect raw brain wave data is the Emotiv Pro.
The Emotiv Pro gives us a real time display of EMOTIV headset data streams including raw EEG, motion data, data packet acquisition and loss, and contact quality. It also allows us to export the data in .csv format for further processing. It also allows us to process the signals in real time, giving detailed information about the signal power the ability to apply fourier transforms to between visualize the frequency content of the signals.
12
A comma-separated values (CSV) file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. A CSV file of a 10 second long recording of raw brain wave data is a 14 by 1280 matrix. The number of rows usually vary with the duration of the recording. The data is then imported into python where we use feature extraction to eliminate the baseline from each dataset.
Modular Division: Data Acquisition: The first step of our project will be gathering data from atleast 200 test subjects. Each subject will be given the headset and they will then be asked to focus on 2 specific objects plus a baseline (neutral) thought amalgamating into 3 thought processes. The software we’ll use to record the data is the Emotiv Pro, available on the Emotiv online store. Each data set is then sampled at 128 bits per second and stored in a .csv file as a collection of datapoints.
13
Feature Extraction: The neutral thought will be measured by asking the subject to stare at a black screen for a set time and keep other thoughts at a minimum – they’ll be asked to be completely relaxed. They’ll then be asked to focus on 2 objects, say an apple and a basketball. The raw data from the 3 thought processes will then be cleaned of noise by subtracting the baseline thought from the apple and baseball thoughts via the numpy library in python.
Algorithm Design and Plotting: Both t-SNE and PCA are popular and widely used techniques but have been seldom used to visualize brainwave datasets. Brainwave data is unique in its structure because it is an amalgamation of 14 different waves from different parts of the brain that need to be analyzed simultaneously as patterns can span across different lobes, rather than just being limited to a single waveform. In addition, the sampled dataset is a merely a matrix of distinct voltage levels at the area of the placement of the sensor. The algorithm needs to be taught how to interpret each column in the dataset as well as how to treat each of the 14 waveforms as a dimension of a single datapoint. This needs to be done manually by tweaking some key parameters of the algorithms. In addition, as with all machine learning algorithms, other hyperparameters also need to be tuned in order to get the best result for the concerned dataset. These are set by trial and error, using random sets of values and extrapolating how to improve them by looking at the outputs they generate.
Project Workflow
Data Acquisition
Noise Removal
14 Similarity TSNE Algorithm Matrix Output
Feature Extraction Comparison Analysis
Principle
PCA Algorithm Components Output
Risk Analysis 1. Signal lost during transmission from headset
15
There will be times when the headset will stop transmitting brain wave data and lose connection with a PC entirely. This will be made evident when the dots representing each of the 14 channels on the control panel turn black. This may happen for a number of reasons: a) The user moved their head and changed the sensor orientation on their scalp. b) The sensors haven’t been hydrated properly c) The user is standing at a distance and the Bluetooth receiver can’t pick up on the data being sent. All of these points can be remedied by making sure that the user is in close proximity with the Bluetooth receiver, have the sensors properly hydrated before use and have the user keep their head as still as possible. 2. A headset sensor stops working 14 sensors are supplied with the headset, but because their position isn’t really secure, there’s a good chance that they fall out and/or stop working completely due to excessive use. We bought an extra set of sensors and felt-pads should a situation like this occur. 3. User thinks of something else To gather data for our project, we need people to focus on a specific object for at-least 10 seconds. Human nature, however, is unpredictable and sometimes our minds wander even when we try not to. A user thinking of any other event instead of explicitly focusing on the object they’ve been given will ruin the dataset.
Possible Extension: After successful implementation of our proposed concept, our team has enlisted a group of applications that could be developed using the core technology that we would have developed above. 16
The primary application of Brain Computer interface that our team has planned to develop is speech detection using brain signals for mute people. This can be implemented by scanning brain waves of a person and classifying that signal in order to know what word that person is trying to speak. In this case we will keep our vocabulary limited to a few words and would pre-train the headset with those set of words through a limited number of people’s brain waves. After developing an appropriate machine learning model and then training it with other people’s data, the headset program could be generalized and made universal without training it specifically for every person as it is being done right now.
17
References: 1. Emotiv Epoc+ EEG Headset, Designed and Developed by Emotiv Inc. – www.emotiv.com 2. L.v.d. Maaten. Visualizing t-SNE Data. In Journal of Machine Learning Research (2008) 3. Yann LeCun. The MNIST Database of Handwritten Digits. Courant Institute, NYU 4. Matt Lang. Investigating the EPOC for cognitive control. University of Canterbury (2012)
5. 10/20 System Positioning Manual. Trans Cranial Technologies
18