Decompress And Read Dukascopy .bi5 Tick Files
Solution 1:
The code below should do the trick. First, it opens a file and decodes it in lzma and then uses struct to unpack the binary data.
import lzma
import struct
import pandas as pd
def bi5_to_df(filename, fmt):
chunk_size = struct.calcsize(fmt)
data = []
with lzma.open(filename) as f:
while True:
chunk = f.read(chunk_size)
if chunk:
data.append(struct.unpack(fmt, chunk))
else:
break
df = pd.DataFrame(data)
return df
The most important thing is to know the right format. I googled around and tried to guess and '>3i2f'
(or >3I2f
) works quite good. (It's big endian 3 ints 2 floats. What you suggest: 'i4f'
doesn't produce sensible floats - regardless whether big or little endian.) For struct
and format syntax see the docs.
df = bi5_to_df('13h_ticks.bi5', '>3i2f')
df.head()
Out[177]:
0 1 2 3 4
0 210 110218 110216 1.87 1.12
1 362 110219 110216 1.00 5.85
2 875 110220 110217 1.00 1.12
3 1408 110220 110218 1.50 1.00
4 1884 110221 110219 3.94 1.00
Update
To compare the output of bi5_to_df
with https://github.com/ninety47/dukascopy,
I compiled and run test_read_bi5
from there. The first lines of the output are:
time, bid, bid_vol, ask, ask_vol
2012-Dec-0301:00:03.581000, 131.945, 1.5, 131.966, 1.52012-Dec-0301:00:05.142000, 131.943, 1.5, 131.964, 1.52012-Dec-0301:00:05.202000, 131.943, 1.5, 131.964, 2.252012-Dec-0301:00:05.321000, 131.944, 1.5, 131.964, 1.52012-Dec-0301:00:05.441000, 131.944, 1.5, 131.964, 1.5
And bi5_to_df
on the same input file gives:
bi5_to_df('01h_ticks.bi5','>3I2f').head()Out[295]:0123403581 1319661319451.501.515142 1319641319431.501.525202 1319641319432.251.535321 1319641319441.501.545441 1319641319441.501.5
So everything seems to be fine (ninety47's code reorders columns).
Also, it's probably more accurate to use '>3I2f'
instead of '>3i2f'
(i.e. unsigned int
instead of int
).
Solution 2:
import requests
import struct
from lzma import LZMADecompressor, FORMAT_AUTO
# for download compressed EURUSD 2020/06/15/10h_ticks.bi5 file
res = requests.get("https://www.dukascopy.com/datafeed/EURUSD/2020/06/15/10h_ticks.bi5", stream=True)
print(res.headers.get('content-type'))
rawdata = res.content
decomp = LZMADecompressor(FORMAT_AUTO, None, None)
decompresseddata = decomp.decompress(rawdata)
firstrow = struct.unpack('!IIIff', decompresseddata[0: 20])
print("firstrow:", firstrow)
# firstrow: (436, 114271, 114268, 0.9399999976158142, 0.75)# time = 2020/06/15/10h + (1 month) + 436 milisecond
secondrow = struct.unpack('!IIIff', decompresseddata[20: 40])
print("secondrow:", secondrow)
# secondrow: (537, 114271, 114267, 4.309999942779541, 2.25)# time = 2020/06/15/10h + (1 month) + 537 milisecond# ask = 114271 / 100000 = 1.14271# bid = 114267 / 100000 = 1.14267# askvolume = 4.31# bidvolume = 2.25# note that 00 -> is january# "https://www.dukascopy.com/datafeed/EURUSD/2020/00/15/10h_ticks.bi5" for january# "https://www.dukascopy.com/datafeed/EURUSD/2020/01/15/10h_ticks.bi5" for february# iteratingprint(len(decompresseddata), int(len(decompresseddata) / 20))
for i inrange(0, int(len(decompresseddata) / 20)):
print(struct.unpack('!IIIff', decompresseddata[i * 20: (i + 1) * 20]))
Solution 3:
Did you try using numpy as to parse the data before transfer it to pandas. Maybe is a long way solution, but I will allow you to manipulate and clean the data before you made the analysis in Panda, also the integration between them are pretty straight forward,
Post a Comment for "Decompress And Read Dukascopy .bi5 Tick Files"