Communicating between Python and MT5

Metatrader 5 (MT5) – one of the most popular platforms for trading on financial markets with its own programming language for developing automating trading strategies (mql5).

Python – Includes incredibly powerful tools for machine learning and predictive analytics.

It seems like a link between these two would be a match made in heaven, enabling the use of machine learning in trading strategies. However, this link isn’t as straight forward as I hoped. A common solution is to export data from Metatrader manually and then analyse it python before setting up trading strategies. Unfortunately, this approach prevents the use of metatrader’s best features: automation.

I decided to create a DLL written in C++ to link Metatrader 5 and Python. Although I believe the DLL could be used by a number of programs to link with Python.

Initially I attempted to create the DLL using Boost.python. This strategy worked well, but unfortunately I had some issues compiling the library for 64 bit systems, to which my Metatrader was limited. Therefore, I decided to rather use Python’s C API. An article by arnavguddu on embedding Python in C++ was particularly useful (https://www.codeproject.com/Articles/820116/Embedding-Python-program-in-a-C-Cplusplus-code). I made use of his code for pyhelper.hpp as it made the initialization and finalization of the Python environment much easier.

Next I created my RunPythonCode below to call a python function in a saved file by the name of the file and function (both stored as a character array). Two variables are passed to the python function, namely a 2D Numpy array of up to 20 fields (mql5_arr), and an array of arguments (args).

#include <Python.h>
#include "pyhelper.h"
#include "arrayobject.h"

CPyInstance hInstance;
CPyObject pName, pModule, pFunc, pValue, pInputs, pargs, pArray;

long RunPythonCode(double *mql5_arr[][20], int array_rows, int array_cols, char FileName[], char FuncName[], int *args[]) {
	
	if (PyArray_API == NULL)
	{
		import_array();
	}
	pName = PyUnicode_FromString(FileName);
	pModule = PyImport_Import(pName);

	if (pModule)
	{
		pFunc = PyObject_GetAttrString(pModule, FuncName);

		if (pFunc && PyCallable_Check(pFunc))
		{
			// Build the 2D array in C++
			int SIZE = array_rows;
			npy_intp dims[2]{ array_rows, array_cols };
			const int ND = 2;

			// Convert it to a NumPy array.
			pArray = PyArray_SimpleNewFromData(ND, dims, NPY_DOUBLE, reinterpret_cast<void*>(mql5_arr));

			if (!pArray) {
				printf("Error converting to python array.\n");
				return -2;
			}

			//Add python array to tuple
			pInputs = PyTuple_New(2); //Number of inputs to function
			PyTuple_SetItem(pInputs, 0, pArray); //Add value to inputs

			//Add args to Inputs
			int foo = sizeof(args) / sizeof(int);
			npy_intp argsdims[1]{ foo };
			pargs = PyArray_SimpleNewFromData(1, argsdims, NPY_INT, reinterpret_cast<void*>(args));
			PyTuple_SetItem(pInputs, 1, pargs); //Add value to inputs

			// Execute function pFunc with variables pInputs
			pValue = PyObject_CallObject(pFunc, pInputs);

			long pyResult = PyLong_AsLong(pValue);
			return pyResult;
		}
		else
		{
			printf("ERROR: Python function in module\n");
		}

	}
	else
	{
		printf_s("ERROR: Module not imported\n");
		return -3;
	}

	return 1;
}

The next bit of code is the only function of the Dll. The strings for the python file and function name are received as wchar_t* and then converted to char arrays using wcstombs_s.

#include <wchar.h>
_DLLAPI long __stdcall CallPython(double *mql5_arr[][20], int array_rows, int array_cols, wchar_t* FileName, wchar_t* FuncName, int *args[]) //First column should be the targets
{
	int n_file = wcslen(FileName) + 1;
	int n_func = wcslen(FuncName)+1;
	size_t i_file,i_func;
	char *CharFileName = new char[n_file];
	char *CharFuncName = new char[n_func];
	wcstombs_s(&i_file, CharFileName, n_file, FileName, n_file);
	wcstombs_s(&i_func, CharFuncName, n_func, FuncName, n_func);

	long Result = RunPythonCode(mql5_arr, array_rows, array_cols, CharFileName, CharFuncName, args);
	delete(CharFileName);
	delete(CharFuncName);
	return Result;
}

The Dll can  be imported into mql5 using:

#import "D:\\Documents\\My_Projects\\Markets\\PythonDll\\x64\\Release\\PythonDll.dll"
   long CallPython(double &a[][11], int array_rows, int array_cols, string &FileName, string &FuncName, int &args[]);
#import

An example of the use of the Dll in which a function called WriteData is called from a python script called PythonCode.py is:

void CallPython()
 {
   Print("Writing data to file...");
   double AllInputs[][11];
   GetMarketInputs(AllInputs,NumRows); // Add data to matrix
   int args[] = {1,1}; // Additional args to python if needed
   string FileString = "PythonCode";
   string FuncString = "WriteData";
   Print("Result = " ,CallPython(AllInputs,NumRows,NumCols,FileString,FuncString,args)); // NumCols = 11 for this example
   
   return(INIT_SUCCEEDED);
 }

The python script associated with this example is:

 
def WriteData(a, args):
 DataFileName = 'D:\Documents\My_Projects\Markets\AllData.csv'
 print('Python write function called!')
 file = open(DataFileName, 'w')
 writer = csv.writer(file,lineterminator = '\n')
 for row in a:
 writer.writerow(row)
 
 file.close()
 return 1;

For additional information or to download the files, please see the repository at:
https://github.com/sdswart/pythondll

Probability Theory – Naive Bayes

Bayes’ theorem has been described as the equivalent to the theory of probability as what the Pythagorean theorem is to geometry[1]. Simply put, Bayes’ theorem describes the probability of a hypothesis, based on evidence that may be related to the hypothesis. For example, you want to determine the probability that a person has cancer given that he smokes. This probability can be determined with Bayes’ theorem by considering the probably that someone smokes given that he has cancer, as well as the probability of getting cancer and the probability that someone smokes (whatever the cause). Mathematically Bayes’ theory is expressed as:

    \[ P(H|e)=\frac{P(e|H)P(H)}{P(e)} \]

where:
• P(H|e) is the probability that our hypothesis is true given the evidence (Posterior).
• P(e|H) is the probability of the evidence given that our hypothesis is true (Likelihood).
• P(H) is the probability of our hypothesis before the evidence (Prior).
• P(e) is the probability of the evidence (Marginal). P(e) = ∑P(e|H_i)P(H_i)

Therefore, our example could be expressed as:

    \[ P(Cancer|Smokes)=\frac{P(Smokes|Cancer)P(Cancer)}{P(Smokes)} \]

where:
• P(Cancer|Smokes) is the probability that a person has cancer given that he smokes.
• P(Smokes|Cancer) is the probability hat someone smokes given that he has cancer.
• P(Cancer) is the probability of someone having cancer.
• P(Smokes) is the probability that someone smokes.

In practice, there is interest only in the numerator of the equation because the denominator does not depend on the hypothesis, and if the evidence is given, the denominator is effectively constant.

Bayes’ theorem can be extended to probability distributions (to determine confidence intervals and other statistical tests). For example, a survey of 20 people with cancer found that 5 of them smoked. To be safe we create our Prior hypothesis with the assumption that smoking has no relationship with cancer (i.e. it is equally likely for a person with cancer to smoke and not to smoke). This can be represented as a uniform random distribution:

#%% Python Learn Function
# Import libraries
import numpy as np
from matplotlib import pyplot as plt
#Create our uniform distribution
n_draws = 100000 #Number of points to consider in the distribution (should be large)
prior = np.random.uniform(0,1,n_draws)
#Plot the distribution as a histogram
plt.hist(prior,21)

Uniform distribution

We can now use our prior to generate a binomial distribution for our sample size of 20 people:

gen=np.random.binomial(20,prior)
plt.hist(gen,21)

Uniform binomial distribution

The posterior distribution for our survey can be obtained by selecting the generated points that yielded our result of 5 people:

post=[x for x,y in zip(prior,gen) if y==data]
plt.hist(post,21)

Posterior distribution

If new data is obtained later, the posterior distribution can be used as the prior hypothesis for the new evidence.

The main pros and cons of using Bayesian models in Machine learning are:

Positive Negative
Performs well for very small datasets. Bayesian models are computationally expensive and slow.

In machine learning, Bayes’ theorem is primarily used for classification, called Naive Bayes classifiers. They are called naive because they make a big assumption:

Each feature is independent of another.

This assumption means that given a set of features, we can calculate the probability that they fall into a particular class using the equation:

    \[ P(C_k|x_1,x_2,\dots,x_n)=P(C_k)×P(x_1|C_k)×P(x_2|C_k)×\dots×P(x_n|C_k) \]

where
• C_k is the class k
• x_n is the value of feature n

When dealing with continuous data, Gaussian naive Bayes can be used, in which it is assumed that each class is distributed according to a Gaussian distribution. Therefore, the probability of each feature value belonging to a class (P(x_n|C_k)) can be calculated using the mean and variance of the feature data for that class. This equation is given by:

    \[ P(x_n|C_k)= \frac{1}{\sqrt{2\pi\sigma^2}}exp\bigg(\frac{-(6-\mu)^2}{2\sigma^2}\bigg) \]

where
• \sigma^2 is the variance
• \mu is the mean

Therefore, we can develop our own Gaussian naive Bayes model:

class GaussianBayes:
    def __init__(self,data,target):
        #perform error checks and store number of features in data
        assert type(data) is np.ndarray,"data must be an array"
        assert data.ndim<=2,"The data should be a 2D or 1D array."
        self.num_features = 1 if data.ndim==1 else data.shape[1]
        num_points = len(data) if data.ndim==1 else data.shape[0]
        assert num_points==len(target), "Number of points in data (%d) is not equal to the number of targets (%d)" % (num_points,len(target))
        
        #Seporate data by class
        self.separated={}
        self.seporate(data,target)
        
        #record the mean and variance of the data for each class
        self.model = {i:[np.mean(self.separated[i],axis=0),np.var(self.separated[i],axis=0)] for i in self.separated}
    def predict(self,data):
        #perform error checks
        features_in_data = len(data) if data.ndim==1 else data.shape[1]
        assert features_in_data==self.num_features, "Number of features in data (%d) not equal to number of features in training data (%d)" % (features_in_data,self.num_features)
        
        #create function for Gaussian probability
        Gaus = lambda val,mean,variance: (1/(2*np.pi*variance)**0.5)*np.exp((-(val-mean)**2)/(2*variance))
        class_names = list(self.model.keys())
        
        # return result if only one data point is entered
        if data.ndim==1: return class_names[np.argmax([np.prod(Gaus(np.array(data),np.array(self.model[i][0]),np.array(self.model[i][1]))) for i in class_names])]
        
        #Create results array and add the joint probability for each class for each data point
        results=np.zeros((data.shape[0],len(class_names)))
        for class_index in range(len(class_names)):
            np_mean = np.array(self.model[class_names[class_index]][0])
            np_var = np.array(self.model[class_names[class_index]][1])
            for k in range(data.shape[0]):
                np_data = np.array(data[k,:])
                joint_propability=np.prod(Gaus(np_data,np_mean,np_var))
                results[k,class_index]=joint_propability
        
        #return classname for the class with the highest joint probability for each data point
        return [class_names[i] for i in np.argmax(results,axis=1)]
    def seporate(self,data,target):
        for i in range(len(target)):
            if target[i] not in self.separated:
                self.separated[target[i]]=[]
            self.separated[target[i]].append(data[i])

If we evaluate the model with the iris dataset in sklearn we get the following:

from sklearn import datasets
iris = datasets.load_iris()
y_pred = GaussianBayes(iris.data, iris.target).predict(iris.data)
print("Number of mislabeled points from our model out of a total %d points : %d"
      % (iris.data.shape[0],(iris.target != y_pred).sum()))
Number of mislabeled points from our model 
     out of a total 150 points : 6

Comparing this to the Gaussian naive Bayes model included in sklearn we get:

from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print("Number of mislabeled points out of a total %d points : %d"
...       % (iris.data.shape[0],(iris.target != y_pred).sum()))
Number of mislabeled points from sklearn model 
    out of a total 150 points : 6

Therefore, our model produces the same result as that of the model in sklearn.

For more information on classification algorithms, please see my post:
Comparison of classification algorithms

References
1. https://en.wikipedia.org/wiki/Bayes%27_theorem