Hi all,
I'm investigating the quality of a few free sources of data, measuring it by calculating bar gaps with a matlab script I made. Here you can look at M1 data for EURUSD on Alpari UK:
>> checkcsvgaps('EURUSD1.csv', 30000)
Preallocating structured data for better performance... done (elapsed time: 0.032 s)
Reading file into the main memory (ETA: 6.00 s)... done (elapsed time: 6.922 s)
Checking file for gaps... done (elapsed time: 0.609 s)
Bars read: 30000
Gaps: 24710 (82.37 percent of bars)
Total gap: 48256.00 pips
Average bar gap: 1.608 pips
Standard deviation (in pips) on average gap: 2.1752
http://img505.imageshack.us/img505/4370/alparigaps.jpg
and this is the same thing with FXDD:
>> checkcsvgaps('EURUSD2.csv', 30000)
Preallocating structured data for better performance... done (elapsed time: 0.032 s)
Reading file into the main memory (ETA: 6.00 s)... done (elapsed time: 6.093 s)
Checking file for gaps... done (elapsed time: 0.469 s)
Bars read: 30000
Gaps: 6025 (20.08 percent of bars)
Total gap: 6183.00 pips
Average bar gap: 0.206 pips
Standard deviation (in pips) on average gap: 0.4662
http://img403.imageshack.us/img403/350/gapdistr.jpg
So that suggests FXDD data is better (a LOT better in this sample).
This is the tick data for 1 month of AUDJPY at GAINS capital data, which I had previously converted to M1 bars:
checkcsvgaps('AUDJPY.csv', 30000)
Preallocating structured data for better performance... done (elapsed time: 0.015 s)
Reading file into the main memory (ETA: 6.00 s)... done (elapsed time: 5.563 s)
Checking file for gaps... done (elapsed time: 0.578 s)
Bars read: 26904
Gaps: 24087 (89.53 percent of bars)
Total gap: 25826.00 pips
Average bar gap: 0.960 pips
Standard deviation (in pips) on average gap: 1.0771
(it's less than 30k bars because in a month there are =~ 24*60*5/7 M1 bars)
http://img411.imageshack.us/img411/1957/gainm1.jpg
However, if you think about it, a 1 pip gap between one bar and the next isn't really an error in tick data! Still, looking at the 2 pip gaps I'd say the best is FXDD.
My question to you: what other metrics would you use to assess the quality of historical data, apart from bar gaps? Can you tell me of other free data sources I should test?
I'm investigating the quality of a few free sources of data, measuring it by calculating bar gaps with a matlab script I made. Here you can look at M1 data for EURUSD on Alpari UK:
>> checkcsvgaps('EURUSD1.csv', 30000)
Preallocating structured data for better performance... done (elapsed time: 0.032 s)
Reading file into the main memory (ETA: 6.00 s)... done (elapsed time: 6.922 s)
Checking file for gaps... done (elapsed time: 0.609 s)
Bars read: 30000
Gaps: 24710 (82.37 percent of bars)
Total gap: 48256.00 pips
Average bar gap: 1.608 pips
Standard deviation (in pips) on average gap: 2.1752
http://img505.imageshack.us/img505/4370/alparigaps.jpg
and this is the same thing with FXDD:
>> checkcsvgaps('EURUSD2.csv', 30000)
Preallocating structured data for better performance... done (elapsed time: 0.032 s)
Reading file into the main memory (ETA: 6.00 s)... done (elapsed time: 6.093 s)
Checking file for gaps... done (elapsed time: 0.469 s)
Bars read: 30000
Gaps: 6025 (20.08 percent of bars)
Total gap: 6183.00 pips
Average bar gap: 0.206 pips
Standard deviation (in pips) on average gap: 0.4662
http://img403.imageshack.us/img403/350/gapdistr.jpg
So that suggests FXDD data is better (a LOT better in this sample).
This is the tick data for 1 month of AUDJPY at GAINS capital data, which I had previously converted to M1 bars:
checkcsvgaps('AUDJPY.csv', 30000)
Preallocating structured data for better performance... done (elapsed time: 0.015 s)
Reading file into the main memory (ETA: 6.00 s)... done (elapsed time: 5.563 s)
Checking file for gaps... done (elapsed time: 0.578 s)
Bars read: 26904
Gaps: 24087 (89.53 percent of bars)
Total gap: 25826.00 pips
Average bar gap: 0.960 pips
Standard deviation (in pips) on average gap: 1.0771
(it's less than 30k bars because in a month there are =~ 24*60*5/7 M1 bars)
http://img411.imageshack.us/img411/1957/gainm1.jpg
However, if you think about it, a 1 pip gap between one bar and the next isn't really an error in tick data! Still, looking at the 2 pip gaps I'd say the best is FXDD.
My question to you: what other metrics would you use to assess the quality of historical data, apart from bar gaps? Can you tell me of other free data sources I should test?