Analysis package

Submodules

Analysis.DataframeAnalysis module

class Analysis.DataframeAnalysis.DataframeAnalysis(root_path, data_path)

Bases: object

getADF(start_col=None, end_col=None)

Get the ADF of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The ADF result of each column.

Return type:

dict

getAverageColumn(start_col=None, end_col=None)

Get the average of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The average value of each column.

Return type:

pd.DataFrame

getCorr(method='pearson', start_col=None, end_col=None)

Get the cross correlation of each column in the target dataset from the starting column to the ending column.

Parameters:
  • method – The calculation method of cross correlation (‘pearson’ | ‘kendall’ | ‘spearman’)

  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The cross correlation of each column.

Return type:

pd.DataFrame

getDFGLS(start_col=None, end_col=None)

Get the DF-GLS result of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The DF-GLS result of each column.

Return type:

dict

getFFTtopk(col, top_k_seasons=3)

Get the Fast Fourier Transform result of a certain column in the target dataset.

Parameters:
  • col – The input column.

  • top_k_seasons – The number of top k seasons.

Returns:

The Fast Fourier Transform result of a certain column in the target dataset.

getInterpolate(start_col=None, end_col=None, **kwargs)

Get the interpolate result of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

  • kwargs – The arguments of interpolate.

Returns:

The interpolate result of each column in the target dataset from the starting column to the ending column.

getKPSS(start_col=None, end_col=None)

Get the KPSS result of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The KPSS result of each column.

Return type:

dict

getMaxColumn(start_col=None, end_col=None)

Get the max of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The max of each column.

Return type:

pd.DataFrame

getMedianColumn(start_col=None, end_col=None)

Get the median of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The median of each column.

Return type:

pd.DataFrame

getMinColumn(start_col=None, end_col=None)

Get the min of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The min of each column.

Return type:

pd.DataFrame

getNanIndex(start_col=None, end_col=None)

Get the index containing Nan in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The index containing Nan in the target dataset from the starting column to the ending column.

Return type:

np.array

getPhillipsPerron(start_col=None, end_col=None)

Get the phillips perron result of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The phillips perron result of each column.

Return type:

dict

getQuantileColumn(percent=[0.25, 0.5, 0.75], start_col=None, end_col=None)

Get the quantile of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

  • end_col – The ending column.

Returns:

The quantile of each column.

Return type:

pd.DataFrame

getSelfCorr(lag=1, start_col=None, end_col=None)

Get the self correlation of each column in the target dataset from the starting column to the ending column.

Parameters:
  • lag – The lagging length used to calculate self correlation.

  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The self correlation of each column.

Return type:

pd.DataFrame

getShape()

Get the shape of target dataset.

Returns:

The shape of target dataset.

getStdColumn(start_col=None, end_col=None)

Get the standard deviation of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The standard deviation value of each column.

Return type:

pd.DataFrame

getVarianceColumn(start_col=None, end_col=None)

Get the variance of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The variance value of each column.

Return type:

pd.DataFrame

getVarianceRatio(start_col=None, end_col=None)

Get the Variance Ratio result of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The Variance Ratio result of each column.

Return type:

dict

getZivotAndrews(start_col=None, end_col=None)

Get the Zivot-Andrews result of each column in the target dataset from the starting column to the ending column.

Parameters:
  • start_col (str) – The starting column.

  • end_col (str) – The ending column.

Returns:

The Zivot-Andrews result of each column.

Return type:

dict

Module contents