Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
217e9fc
merge deployment branch into local branch
KyleHaggin Jan 30, 2020
15b2f40
merge deployment branch with local branch
KyleHaggin Jan 31, 2020
294c7f9
Merge branch 'deployment' of https://github.com/Lambda-School-Labs/cr…
KyleHaggin Feb 2, 2020
2d031f5
Merge deployment branch to local branch
KyleHaggin Feb 2, 2020
7d50f38
pep8 conformity
KyleHaggin Feb 3, 2020
ac1d570
cryptolytic.model.data_work up to pep8 conformity
KyleHaggin Feb 3, 2020
e09b4b1
cryptolytic.model pep8 conformed. blank files deleted
KyleHaggin Feb 3, 2020
bdf37a1
cryptolytic.data.__init conformed to pep8 standards
KyleHaggin Feb 3, 2020
f71518c
cryptolytic.data.aws conformed to pep8 standards
KyleHaggin Feb 3, 2020
059fd92
cryptolytic.data.historical pep8 conformed
KyleHaggin Feb 3, 2020
4b630b0
cryptolytic.data.metrics pep8 conformed
KyleHaggin Feb 3, 2020
7899802
cryptolytic.data.spl pep8 conformed
KyleHaggin Feb 3, 2020
3e5266c
cryptolytic.data.utils pep8 conformed
KyleHaggin Feb 3, 2020
df0dfb7
cryptolytic.util pep8 conformed
KyleHaggin Feb 3, 2020
2e8ef3e
cryptolytic.start pep8 conformed
KyleHaggin Feb 3, 2020
c45fa9c
more commenting on cryptolytic.data.aws
KyleHaggin Feb 3, 2020
6845b6c
Updated README.md with new data source API documentations and locatio…
KyleHaggin Feb 5, 2020
d01d79e
Updated predicitons and features sections of the README.md to reflect…
KyleHaggin Feb 5, 2020
56d2fda
Deleted unneeded whitespace
KyleHaggin Feb 5, 2020
96d6400
Deleted unneeded whitespace
KyleHaggin Feb 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 27 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,35 +43,28 @@ Python, AWS, PostgreSQL, SQL, Flask

### Predictions

The models folder contains two zip files, with a total of 30 models:

tr_pickles.zip contains nine pickled trade recommender models.

arb_models.zip contains 21 pickled arbitrage models.

All 30 models use a RandomForestClassifier algorithm.

Each trade recommender model recommends trades for a particular trading pair on a particular exchange by predicting whether the closing price will increase by enough to cover the costs of executing a trade.

The arbitrage models predict arbitrage opportunities between two exchanges for a particular trading pair. Predictions are made ten minutes in advance. To count as an arbitrage opportunity, a price disparity between two exchanges must last for at least thirty minutes, and the disparity must be great enough to cover the costs of buying on one exchange and selling on the other.

### Features

Each of the nine trade recommender models is trained on 67 features. Of those 67 features, five are taken directly from the OHLCV data (open, high, low, close, base_volume), one indicates where gaps were present in the data (nan_ohlcv), three indicate the time (year, month, day), and the remainder are technical analysis features.
The arbitrage models predict arbitrage opportunities between two exchanges for a particular trading pair. Predictions are made five minutes in advance. To count as an arbitrage opportunity, a price disparity between two exchanges must last for at least thirty minutes, and the disparity must be great enough to cover the costs of buying on one exchange and selling on the other.

Each of the 21 arbitrage models is trained on 91 features. Of those 91 features, three features indicate the time (year, month, day), and four indicate the degree and length of price disparities between two exchanges (higher_closing_price, pct_higher, arbitrage_opportunity, window_length). Half of the remaining 84 features are specific to the first of the two exchanges in a given arbitrage dataset and are labelled with the suffix "exchange_1"; the other half are specific to the second of those two exchanges and are labelled with the suffix "exchange_2". In each of these two sets of 42 features, two are taken directly from the OHLCV data (close_exchange_#, base_volume_exchange_#), one indicates where gaps were present in the data (nan_ohlcv), and the remainder are technical analysis features.
The trained and pickled models can be accessed via the organization's AWS S3 bucket by the name of "crypto-buckit" under folder aws/models. Current code (as of 4 February 2020) will upload all future models into this S3 bucket.

Technical analysis features were engineered with the Technical Analysis Library; they fall into five types:
The naming convetion for the models is model\_{arbitrage/trade}\_{api}\_{trading\_pair}.pkl

(1) Momentum indicators
The predictions themselves can be accessed via the Organization's AWS RDS database with the table name "predictions".

(2) Volume indicators
### Features

(3) Volatility indicators
Each of the nine trade recommender models is trained on 80 features. Of those 80 features, five are taken directly from the OHLCV data (open, high, low, close, base\_volume), and the remainder are technical analysis features. We are filling NaN values of open, high, low, close with the average price, and forward filling NaN values for base\_volume

(4) Trend indicators
Each of the arbitrage models is trained on 80 features. Of those 80 features, and four indicate the degree and length of price disparities between two exchanges (higher_closing_price, pct_higher, arbitrage_opportunity, window_length). Arbitrage is calculated by comparing the price of the primary exchange against the mean price of the other exchanges. This allows us to compare the market against every other market with minimal computation cost.

(5) Others indicators
Technical analysis features were engineered with the Technical Analysis Library; they fall into five types:<br/>
(1) Momentum indicators<br/>
(2) Volume indicators<br/>
(3) Volatility indicators<br/>
(4) Trend indicators<br/>
(5) Others indicators<br/>

Documentation for the technical analysis features features is available here:

Expand All @@ -81,14 +74,22 @@ Documentation for the technical analysis features features is available here:

We obtained all of our data from the Cryptowatch, Bitfinex, Coinbase Pro, and HitBTC APIs. Documentation for obtaining that data is listed below:

[Cryptowatch API OHLCV Data Documentation](https://developer.cryptowat.ch/reference/rest-api-markets#market-ohlc-candlesticks)
[Cryptowatch REST API OHLCV Data Documentation](https://docs.cryptowat.ch/rest-api/)

[Bitfinex API OHLCV Data Documentation](https://docs.bitfinex.com/reference#rest-public-candles)

[Coinbase Pro API OHLCV Data Documentation](https://docs.pro.coinbase.com/?r=1#get-historic-rates)

[HitBTC OHLCV Data Documentation](https://api.hitbtc.com/#candles)

[Binance API OHLCV Documentation](https://github.com/binance-exchange/binance-official-api-docs/blob/master/rest-api.md)

[Gemini REST API OHLCV Data Documentation](https://docs.gemini.com/rest-api/)

[Kraken REST API OHLCV Data Documentation](https://www.kraken.com/en-us/features/api)

[Poloniex API OHLCV Data Documentation](https://docs.poloniex.com/#introduction)

### Python Notebooks

[Notebook Folder](https://github.com/Lambda-School-Labs/cryptolytic-ds/tree/master/finalized_notebooks)
Expand Down Expand Up @@ -117,6 +118,11 @@ Returns: ``` {"results":"{
'prediction': 'result'}
]} ```

### Internal Access via AWS
The raw data and models can also be internally accessed through the Organization's AWS accounts.
-AWS RDS: RDS databases holds historical candlestick data from the cryptocurrency market APIs as well as the prediction from the models.
-AWS S3: S3 buckets holds the pickled, trained models for both Trading and Arbitrage.


## Contributing

Expand Down
80 changes: 46 additions & 34 deletions cryptolytic/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,12 @@ def denoise(signal, repeat):


def resample_ohlcv(df, period=None):
"""this function resamples ohlcv csvs for a specified candle interval; while
this can be used to change the candle interval for the data, it can also be
used to fill in gaps in the ohlcv data without changing the candle interval"""
"""
this function resamples ohlcv csvs for a specified candle interval;
while this can be used to change the candle interval for the data,
it can also be used to fill in gaps in the ohlcv data without changing
the candle interval
"""
# dictionary specifying which columns to use for resampling
ohlcv_dict = {'open': 'first',
'high': 'max',
Expand All @@ -38,7 +41,7 @@ def resample_ohlcv(df, period=None):
'volume': 'sum'}

# apply resampling
if period==None:
if period is None:
period = df['period'][0]
period = pd.to_timedelta(period, unit='s')
df_new = df.resample(period, how=ohlcv_dict)
Expand All @@ -52,8 +55,9 @@ def nan_df(df):

def merge_candle_dfs(df1, df2):
"""Merge candle dataframes"""
merge_cols = ['trading_pair', 'exchange', 'period', 'datetime', 'timestamp']
df_merged = df1.merge(df2, how='inner', on=merge_cols)
merge_cols = ['trading_pair', 'exchange',
'period', 'datetime', 'timestamp']
df_merged = df1.merge(df2, how='inner', on=merge_cols)
return df_merged


Expand All @@ -66,10 +70,14 @@ def outer_merge(df1, df2):


def fix_df(df):
"""Changes columns to the right type if needed and makes sure the index is set as the
datetime of the timestamp. Maybe better to have pandas infer numeric."""
"""
Changes columns to the right type if needed and makes sure the index is
set as the datetime of the timestamp. Maybe better to have pandas
infer numeric.
"""
df['datetime'] = pd.to_datetime(df['timestamp'], unit='s')
numeric = ['period', 'open', 'close', 'high', 'low', 'volume', 'arb_diff', 'arb_signal']
numeric = ['period', 'open', 'close', 'high', 'low', 'volume',
'arb_diff', 'arb_signal']
for col in numeric:
if col not in df.columns:
continue
Expand All @@ -80,18 +88,19 @@ def fix_df(df):

def impute_df(df):
"""
Finds the gaps in the time series data for the dataframe, and pulls the average market
price and its last volume for those values and places those values into the gaps. Any remaining
gaps or new nan values are filled with backwards fill.
Finds the gaps in the time series data for the dataframe, and pulls the
average market price and its last volume for those values and places those
values into the gaps. Any remaining gaps or new nan values are filled
with backwards fill.
"""
df = df.copy()
return df
# resample ohclv will reveal missing timestamps to impute
gapped = resample_ohlcv(df)
gapped = resample_ohlcv(df)
gaps = nan_df(gapped).index
# stop psycopg2 error with int conversion
convert_datetime = compose(int, convert_datetime_to_timestamp)
timestamps = mapl(convert_datetime, list(gaps))
timestamps = mapl(convert_datetime, list(gaps))
info = {'trading_pair': df['trading_pair'][0],
'period': int(df['period'][0]),
'exchange': df['exchange'][0],
Expand All @@ -107,24 +116,24 @@ def impute_df(df):
df = fix_df(df)
df['volume'] = df['volume'].ffill()
df = df.bfill().ffill()
assert df.isna().any().any() == False
assert df.isna().any().any() is False
return df


def get_df(info, n=1000):
"""
Pull info from database and give it some useful augmentation for analysis.
Pull info from database and give it some useful augmentation for analysis.
TODO move functionality into get_data function in historical.
"""
df = sql.get_some_candles(info=info, n=n, verbose=True)
df = impute_df(df)

df['high_m_low'] = df['high'] - df['low']
df['close_m_open'] = df['close'] - df['open']
dfarb = sql.get_arb_info(info, n)

merged = merge_candle_dfs(df, dfarb)
assert merged.isna().any().any() == False
assert merged.isna().any().any() is False
return merged


Expand All @@ -135,11 +144,11 @@ def thing(arg, axis=0):
return x, mu, std


# Version 2
# Version 2
def normalize(A):
if isinstance(A, pd.DataFrame) or isinstance(A, pd.Series):
A = A.values
if np.ndim(A)==1:
if np.ndim(A) == 1:
A = np.expand_dims(A, axis=1)
A = A.copy()
x, mu, std = thing(A, axis=0)
Expand All @@ -149,22 +158,24 @@ def normalize(A):
# from sql
A[:, i] = (x[:, i] - mu[i]) / std[i]
return A


def denormalize(values, df, col=None):
"""Denormalize, needs the original information to be able to denormalize."""
"""
Denormalize, needs the original information to be able to denormalize.
"""
values = values.copy()

def eq(x, mu, std):
return np.exp((x * std) + mu) - 1

if np.ndim(values) == 1 and col is not None:
x, mu, std = thing(df[col])
return eq(values, mu, std)
else:
for i in range(values.shape[1]):
for i in range(values.shape[1]):
x, mu, std = thing(df.iloc[:, i])
if isinstance(values, pd.DataFrame):
if isinstance(values, pd.DataFrame):
values.iloc[:, i] = eq(values.iloc[:, i], mu, std)
else:
values[:, i] = eq(values[:, i], mu, std)
Expand All @@ -177,29 +188,30 @@ def windowed(df, target, batch_size, history_size, step, lahead=1, ratio=0.8):
"""
xs = []
ys = []

x = df
y = df[:, target]

start = history_size # 1000
end = df.shape[0] - lahead # 4990
start = history_size # 1000
end = df.shape[0] - lahead # 4990
# 4990 - 1000 = 3990
for i in range(start, end):
indices = range(i-history_size, i, step)
xs.append(x[indices])
ys.append(y[i:i+lahead])

xs = np.array(xs)
ys = np.array(ys)

nrows = xs.shape[0]
train_size = int(nrows * ratio)
# make sure the sizes are multiples of the batch size (needed for types of models)
# make sure the sizes are multiples of the batch size
# (needed for types of models)
train_size -= train_size % batch_size
val_size = nrows - train_size
val_size -= val_size % batch_size
val_size -= val_size % batch_size
total_size = train_size + val_size
xs = xs[:total_size]
ys = ys[:total_size]

return xs[:train_size], ys[:train_size], xs[train_size:], ys[train_size:]
6 changes: 5 additions & 1 deletion cryptolytic/data/aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,8 @@ def get_path(folder_name, model_type, exchange_id, trading_pair, ext):
aws_folder = os.path.join('aws', folder_name)
if not os.path.exists(aws_folder):
os.mkdir(aws_folder)
return os.path.join(aws_folder, f'model_{model_type}_{exchange_id}_{trading_pair}{ext}')
return os.path.join(
aws_folder, f'model_{model_type}_{exchange_id}_{trading_pair}{ext}'
# Windows operating systems use \\ instead of /, replace function
# required to conform with Unix operating systems
).replace('\\', '/')
Loading