Correlations Between SPDR Gold & HSI | Pandas Profiling | 2019

Correlations Between SPDR Gold & HSI | Pandas Profiling | 2019

Hi In this video I am going to walk you through the process of a quick analysis on accessing the correlations between SPDR Gold Price and Hang Seng Index’ next day percentage change This technique can be applied to the other stocks Feel free to skip any parts to suit your needs Let’s begin Ok what we did last time was getting 2 stocks at once Hit the top right corner and you can revisit the video Here we get SPDR Gold and HSI from Jan 1986 all the way to this moment then import pandas and pandas datareader These are the codes Shift Enter to run the code and now I can load up to the stock data Then here I’m going to store it in a table called df, dataframe The web here is from the pandas datareader the .DataReader method is to read stock info where I defined here The list contains 2 stocks Of cause you can insert more than that I’m not going to do it here Then it uses Yahoo because I’m getting data from Yahoo Finance Start Day which is defined above and also End Day I usually use df.tail which means getting the latest 5 days Press Shift Enter run it Make sure the data is loaded once to Google Colab If you’re not using Colab Let’s say you use Sublime You might need to save it to a csv file externally But here in Colab you don’t need it You can also save to Google Drive but I don’t prefer After getting the stock data you’ll see the column Adjusted Close and Close right I usually keep adjusted close but not close Here you’ll see I use df.drop to drop the original Close column Inside the bracket I use inplace equals true which means permanently remove the Close column Now I want to rename the adjusted column to close usually load it once and then you will see the close column disappears only adjusted close remains right then I’ll rename adjusted close Here you’ll see rename bracket Inside the bracket what’s inside is just how I rename the column from adjusted close which is inside the single quote and this column means changing to close The method is axis equals one changing columns instead of rows If axis is zero by default it’s zero changing rows Here we’re changing columns not rows so axis equals one inplace equals true means the change is permanent Alright Shift Enter then I’ll see Adjusted Close becomes Close now When you look closer August 1st has no figures but HSI does have values that’s why you see Nan I want to get rid of the NaNs which don’t have any values So df.dropna then print it out again df.tail Shift Enter Here you’ll see August 1st row disappears As long as the row has NaN which means no value will be removed That’s all what we did last time Let’s dive in to today’s tasks Firstly I need to know the next trade day by how much HSI is going up or down and the correlation with another factor The term ‘factor’ are simply the table columns Of cause we can always add new columns from our last video High, Low, Close and next day’s up and down percentage have very low correlations Now we need to calculate the next day’s up and down percentage Here you see nday change equals one That means the number of days you look ahead instead of 1 day ahead you can put 10 or 5 days This can be used in codes below Here you can see I first calculate the current day’s percentage change using current close price in comparison with the previous day’s Let’s say SPDR Gold we want to know the percentage up or down on on Jul 31 from Jul 30 133.21 minus 134.97 and then divided by 134.97 You should see negative 1.311297 percent Here I applied to Gold Close price a method called percentage change, just like this one If you won’t put in any numbers inside the bracket like 1 or 2 by default it means the percentage change from previous day You expect to see 0.0 something I’ll just multiply 100 to it The same can be applied to the percentage change of HSI That’s why I only change this to HSI Take a look here Control Slash and then Shift Enter Take a look here df and then gold day change Let me show you the last 5 days Shift Enter Then you’ll see negative 1.31 percent which is here 133.21 minus 134.97 then over 134.97 But we’re not interested in today’s change we want to know tomorrow’s change So I go here I just give it a name df which is my table inside the single quote is the column name GLD nday change it means nday after and the n day is one so GLD nday change is simply here GLD or HSI’s current day change shifting down to the next day That’s why here in this column shifting a positive number of days means going to previous day which is not what we want I don’t need last day but tomorrow’s percentage change, right That’s why, shift a negative nday where we defined earlier here Ndays equal 1 Similarly do the same to HSI see if I’m right df and then a square bracket Inside it add one more square bracket Inside that bracket I put in column one comma column two comma and so on what I’m showing here is today’s percentage change GLD day change which we did earlier HSI day change and GLD nday change tomorrow and HSI nday change tomorrow I don’t need to print it all here I just use .tail the last 5 days Shift Enter You’ll see NaN here on Jul 31 because it’s not Aug 1 yet Simply say GLD current day change is negative 1.31 percent View it from July 30, the previous day The nday change which is Jul 31 is negative 1.31% This applies to the whole column Now I want to use the GLD’s n_value that’s just the fluctuation of gold price If you want to go deep on how I calculate the n_value you can watch the video up top I’m not going to repeat too much here Now I use GLD’s n_value to compare with HSI’s nday change which is this one HSI nday change column to access their correlations I now calculate the n_value first df, which is my table Create a new column this column is gold’s day high minus last day’s close Gold’s day high is simply the GLD column under the high column, here It minus gold’s close from the previous day Just call it brange It’s simply the high column of gold minus previous day close which is the gold column inside close GLD Here I use .shift I need a positive number because when you look back I use a .shift method A negative shift means looking ahead A positive shift means shifting backwards By default, the shift is positive one but I prefer to put one here just to remind myself because sometimes when you have multiple shifts it can be confusing Remember when calculating price fluctuations always take the absolute value by wrapping with abs bracket This number no matter if it’s positive or negative will become positive That’s gold’s current high minus previous close Then I add a column called crange It’s similar to brange The only difference is low instead of high Gold’s low today minus previous close Then calculate gold’s high low today Just call it arange Arange is Today’s high minus today’s low It must be positive but I prefer to keep the abs bracket high minus low You don’t need to shift because it’s today after that I usually print it out like this one Just give it a name table target This table is simply df which is the data table above .loc and then the pattern is rows, comma columns We need all the rows right so we use colon to represent For columns I don’t need to include all columns I only pick some of them so I add a square bracket Inside that feed the column names The columns I need are HSI’s day change tomorrow HSI nday change which we defined early on Then I’ll compare it to the arange let me add comments here GLD which is gold’s ATR average true range I’ll add one more column below This one is true range just come back ATR is df.loc row and column inside right then .max axis equals one applied to the columns For rows we need all the rows and so for columns include from arange to crange, let me see if it works brange crange arange Shift Enter Yup Let’s check it once ATR is the maximum of brange, crange, and arange 1.2799 is more than 1.19 and 0.89 Good I usually skim through the data and check if it works This true range takes an average It can be 20-day moving average 14-day moving average It’s totally up to you I just pick 20-day moving average about a month of time 20 trade days This is the n_value The column is being added here so that we can compare with HSI’s nday change df’s ATR which was defined earlier Inside that apply a rolling method equals 20 or 14 days 5 days, 2 days, or whatever 20 days and .mean Here I’m including the column to table target Shift Enter Our main goal is to get this column and this one Let me add a new table called compare This is simply df.loc It’s purely for display Feel free to skip any steps if you like provided you know what you’re doing I only need n_value and HSI nday change Shift Enter This is what we want then import pandas profiling as ppf ppf is defined here Profile Report Inside the bracket fill in the argument the table compare this one Shift Enter then it will compile a report to compare the 2 columns Let me see Oh we missed 0.2% of all records It doesn’t matter It’s just a small fraction of a total of several thousands Oops we have about 3000 records distinct count only 3500 See here correlations, Pearson You can see HSI day change which is the percentage change next day Of cause its relationship with itself is 100% that’s why it shows deep red because they are the same group of data But HSI’s nday change to gold n_value is a very light blue It’s close to zero If it shows deep blue it means when gold fluctuation increases HSI next day will go down Just go the opposite way if it’s blue. But it isn’t It’s light blue You can say they’re not correlated This is what the Pearson method concludes Spearman also draws similar conclusions They’re both light blue They don’t seem correlated This column, gold price 20-day n_value 因為我哋用20天啦 when comparing to HSI don’t seem very much related That’s means we have to look for another factor Of cause it can’t be done shortly We basically keep searching the column that has high correlations with our target column which is the HSI’s next day change That’s a simple analysis Hope it helps Thank you

Be the first to comment

Leave a Reply

Your email address will not be published.