Attain Premium quality ICT706 Data Analytics Assignment Help and Assessment help Service - Hire Analytics tutors Now!!

Home   Course  
Previous << || >> Next

GET GUARANTEED SATISFACTION OR MONEY BACK UNDER ICT706 DATA ANALYTICS ASSIGNMENT HELP SERVICES OF EXPERTSMINDS.COM - ORDER TODAY NEW COPY OF THIS ASSIGNMENT!

Research Project:

In this research project you will undertake a data analytics approach to solve a set of business problems that require the use of appropriately selected data processing and mining approaches.

Answer:

In [1]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
import seaborn as sns
from scipy import stats
from scipy.stats import kurtosis
from scipy.stats import skew
from sklearn.model_selection import train_test_split

Part A: Load and Clean Data
Part B: Data Exploration
Part C: Predicting Spending Levels
Part D: Predicting Big Spenders
Part E: Business Recommendations

Write Python code to load your dataset into a Pandas DataFrame called 'sales'.

In [2]:
train_data = pd.read_csv('Train_UWu5bXk.csv',header=0)
test_data = pd.read_csv('Test_u94Q5KV.csv',header=0)

In [3]:
train_data.head(10)

In [4]:
train_data.info()
Out[3]:

In [5]:
train_data.describe()
Out[5]:

In [6]:
test_data['Item_Outlet_Sales'] = 0

In [7]:
df = pd.concat([train_data,test_data])

In [8]:
df.head(10)
Out[8]:

In [9]:
df.shape

In [10]:
df.isnull().sum(axis = 0)
Out[9]:
(14204, 12)
Out[10]:

In [11]:
# Fat content
print(df.Item_Fat_Content.unique())
df.loc[df.Item_Fat_Content.isin(['LF','low fat']), 'Item_Fat_Content'] = 'Low Fat'
df.loc[df.Item_Fat_Content.isin(['reg']), 'Item_Fat_Content'] = 'Regular'
print(df.Item_Fat_Content.value_counts())
print(df.Item_Type.unique())
print(df.groupby('Item_Type')['Item_Fat_Content'].count())
df.loc[df.Item_Type.isin(['Health and Hygiene','Household','Others']), 'Item_Fat_Content'
] = 'None'
print(df.Item_Fat_Content.value_counts())

In [12]:
sns.boxplot(df.Item_Type, df.Item_Weight)
plt.xticks(rotation=45)
plt.show()

In [13]:
sns.boxplot(df.Outlet_Identifier, df.Item_Weight)
plt.xticks(rotation=45)
plt.show()

In [14]:
## Out027 and Out019 dont have any identifier associated with them
## Fill missing values in item weight with particular item identifier mean
weights_mean = df.groupby('Item_Identifier',as_index=False).mean()
print(weights_mean.head(5))

In [15]:
df['Item_Weight'] = df.apply(
lambda row: weights_mean.loc[weights_mean['Item_Identifier']==row['Item_Identifier']
,'Item_Weight'] if np.isnan(row['Item_Weight']) else row['Item_Weight'],
axis=1
)

In [16]:
df.isnull().sum(axis = 0)
df.Item_Weight = df.Item_Weight.astype(float)

In [17]:
sns.boxplot(df.Item_Type, df.Item_Weight)
plt.xticks(rotation=45)
plt.show()

In [18]:
df.info()

In [19]:
sns.boxplot(df.Outlet_Identifier, df.Item_Weight)
plt.xticks(rotation=45)
plt.show()

GET BENEFITTED WITH QUALITY ICT706 DATA ANALYTICS ASSIGNMENT HELP SERVICE OF EXPERTSMINDS.COM

In [20]:
df['year'] = 2013 - df.Outlet_Establishment_Year
df = df.drop(['Outlet_Establishment_Year'],axis=1)

In [21]:
df.columns
Out[21]:
Index(['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility',
'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Size',
'Outlet_Location_Type', 'Outlet_Type', 'Item_Outlet_Sales', 'year'],
dtype='object')

In [22]:
sns.kdeplot(df.Item_MRP,shade=True)
plt.axvline(x=70,color="blue")
plt.axvline(x=137,color="blue")
plt.axvline(x=210,color="blue")
Out[22]:
<matplotlib.lines.Line2D at 0x2e50ee46630>

In [23]:
### There are four different range of prices. Lets introduce a variable MRP level to acco unt for that.
conditions = [
(df['Item_MRP'] < 70),
(df['Item_MRP'] < 137),
(df['Item_MRP'] < 210),
(df['Item_MRP'] >210)]
choices = ['Low', 'Medium', 'High','Very high']
df['MRP_level'] = np.select(conditions, choices)

In [24]:
df.MRP_level.head(10)
Out[24]:
0 Very high
1 Low
2 High
3 High
4 Low
5 Low
6 Low
7 Medium
8 Medium
9 High
Name: MRP_level, dtype: object

In [25]:
### Missing values in outlet_size
df.Outlet_Identifier.value_counts()

In [26]:
### Outlet 10 & 19 have reported far less data than other supermarkets.
### Let's assume its because they are smaller and have lesser goods to offer.
df.groupby('Outlet_Identifier').agg({'Item_Identifier' : len})
Out[25]:
OUT027 1559
OUT013 1553
OUT035 1550
OUT046 1550
OUT049 1550
OUT045 1548
OUT018 1546
OUT017 1543
OUT010 925
OUT019 880
Name: Outlet_Identifier, dtype: int64

In [27]:
### From the above table it is clear that outlet 10 & 19 are smaller and hence have lesse
r#
## number of items as indicated by the length of item identifiers.

In [28]:
### Boxplot of Sales vs Outlet Identifier
sns.boxplot(train_data.Outlet_Identifier,train_data.Item_Outlet_Sales)
plt.xticks(rotation=45)
plt.show()

In [29]:
### Boxplot of Sales vs Outlet Type
sns.boxplot(train_data.Outlet_Type,train_data.Item_Outlet_Sales)
plt.xticks(rotation=45)
plt.show()

In [30]:
# Sales in the one type 2 supermarket appear a bit low.
# Maybe it's because it's still fairly new, having
# been founded 4 years ago.

In [31]:
### Boxplot of Sales vs Outlet Type
ax = sns.boxplot(x="Outlet_Type", y="Item_Outlet_Sales", data=train_data,hue="Outlet_Siz
e")
ax.set_xticklabels(ax.get_xticklabels(),rotation=45)
ax.legend(loc='upper left')
plt.show()

In [32]:
df.columns
In [33]:
othershops = df.groupby(['Outlet_Identifier','Outlet_Type', 'Outlet_Location_Type', 'Out
let_Size']).agg({'Outlet_Size' : len})
othershops = othershops.add_suffix('_Count').reset_index()
Out[32]:
Index(['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility',
'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Size',
'Outlet_Location_Type', 'Outlet_Type', 'Item_Outlet_Sales', 'year',
'MRP_level'],
dtype='object')

In [34]:
### Out10 is small
df['Outlet_Size'] = np.where(df['Outlet_Identifier'] == 'OUT010', 'SMALL', df['Outlet_Si
ze'])

In [35]:
### Boxplot of Sales vs Outlet Location Type
ax = sns.boxplot(x="Outlet_Location_Type", y="Item_Outlet_Sales", data=train_data,hue="O
utlet_Size")
ax.set_xticklabels(ax.get_xticklabels(),rotation=45)
ax.legend(loc='upper left')
plt.show()

In [36]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Item_Type", y="Item_Outlet_Sales", data=train_data,hue="Outlet_Size"
)a
x.set_xticklabels(ax.get_xticklabels(),rotation=90)
ax.legend(loc='upper left')
plt.show()


In [37]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Item_Type", y="Item_Outlet_Sales", data=train_data,hue="Outlet_Type"
)a
x.set_xticklabels(ax.get_xticklabels(),rotation=90)
ax.legend(loc='upper left')
plt.show()

In [38]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Item_Type", y="Item_Visibility", data=df,hue="Outlet_Type")
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
ax.legend(loc='upper left')
plt.show()

In [39]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Outlet_Identifier", y="Item_Visibility", data=df)
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [40]:
# let's have a look at the item identifiers now,
# there are way too many of them.
##
keeping only the first two letters gives us three groups:
# food, drink and non-food
df['Item_class'] = df['Item_Identifier'].str[0:2]

In [41]:
df['Item_class'].value_counts()
Out[41]:

In [42]:
### Keeping the first three letters gives a higher granularity
df['Item_Identifier'] = df['Item_Identifier'].str[0:2]

In [43]:
df['Item_Identifier'].value_counts()
FD 10201
NC 2686
DR 1317
Name: Item_class, dtype: int64
Out[43]:
FD 10201
NC 2686
DR 1317
Name: Item_Identifier, dtype: int64

In [44]:
newdf = df.select_dtypes(exclude=['object'])

In [45]:
corr = newdf.corr()
# plot the heatmap
sns.heatmap(corr,
xticklabels=corr.columns,
yticklabels=corr.columns)

In [46]:
# Scatter plot of Item_Outlet_Sales vs Item_MRP
fg = sns.FacetGrid(data=df, hue='Outlet_Type', aspect=1.61)
fg.map(plt.scatter, 'Item_Outlet_Sales', 'Item_MRP').add_legend()

In [47]:
# Scatter plot of Item_Outlet_Sales vs Item_Visibility
fg = sns.FacetGrid(data=df, hue='Outlet_Type', aspect=1.61)
fg.map(plt.scatter, 'Item_Outlet_Sales', 'Item_Visibility').add_legend()

In [48]:
# Scatter plot of Item_Outlet_Sales vs Item_Visibility
fg = sns.FacetGrid(data=df, hue='Outlet_Size', aspect=1.61)
fg.map(plt.scatter, 'Item_Outlet_Sales', 'Item_Visibility').add_legend()

In [49]:
# Scatter plot of Item_Outlet_Sales vs Item_Visibility
fg = sns.FacetGrid(data=df, hue='Outlet_Identifier', aspect=1.61)
fg.map(plt.scatter, 'Item_Outlet_Sales', 'Item_Visibility').add_legend()

SAVE YOUR HIGHER GRADE WITH ACQUIRING ICT706 DATA ANALYTICS ASSIGNMENT HELP & QUALITY HOMEWORK WRITING SERVICES OF EXPERTSMINDS.COM

In [50]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Item_Type", y="Item_Outlet_Sales", data=df,hue="Outlet_Type")
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [51]:
### Plenty of Outliers here. We can reduce this by dividing Item_Outlet_Sales by Item_MRP
### Boxplot of Sales vs Item Type
df['Ratio'] = df['Item_Outlet_Sales']/df['Item_MRP']
ax = sns.boxplot(x="Item_Type", y="Ratio", data=df,hue="Outlet_Type")
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [52]:

ax = sns.barplot(x="Item_Type", y="Ratio", data=df,hue="Outlet_Type")
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [53]:
# dividing sales by MRP does reduce the number of outliers
# and also emphasizes the differences between the different
# types of shop
df['Item_Outlet_Sales'] = df['Ratio']
df = df.drop(['Ratio'],axis=1)

In [54]:
# Lets see the ratio of supermarkets to grocery types
df.Outlet_Type.value_counts()
# Although the ratio is too big, a random forest or gbm should be able to deal with this
given enough trees.

In [55]:
# Time to look at the data for each shop separately
def analyze_shop(shop_id):
shopdata = df[df['Outlet_Identifier'].str.contains(shop_id)]
# as Size, location type and type have only one level, we can drop them here
# since the variance of Outlet_Establishment_Year is zero, we
# can also remove that column
shopdata = shopdata.drop(['Outlet_Identifier',
'Outlet_Size',
'Outlet_Location_Type',
'Outlet_Type',
'year'], axis=1)
plt.figure(1)
plt.subplot(221)
sns.distplot(shopdata.Item_Weight)
plt.subplot(222)
sns.distplot(shopdata.Item_Visibility)plt.subplot(223)
sns.distplot(shopdata.Item_MRP)
plt.subplot(224)
sns.distplot(shopdata.Item_Outlet_Sales)
plt.figure(2)
ax = sns.boxplot(x="Item_Type", y="Item_Outlet_Sales", data=shopdata)
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [56]:
analyze_shop('OUT018')

In [57]:
# one hot encoding
cols = df.select_dtypes(include=["object"]).columns
df2 = pd.get_dummies(df, columns=cols, drop_first=True)

In [58]:
# let's resurrect the train and test data sets
new_train = df2[1:train_data.shape[0]]
new_test = df2[-test_data.shape[0]:]

In [59]:
print(new_test.shape)
print(new_train.shape)
(5681, 46)
(8522, 46)

In [60]:
target = new_train.Item_Outlet_Sales
new_train= new_train.drop('Item_Outlet_Sales',axis=1)

In [61]:
new_test = new_test.drop('Item_Outlet_Sales',axis=1)

In [62]:
new_train.to_csv('new_train.csv', sep=',', encoding='utf-8')
new_test.to_csv('new_test.csv', sep=',', encoding='utf-8')

In [63]:
# Data scaling
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
new_train_scaled = pd.DataFrame(scaler.fit_transform(new_train), columns=new_train.colum
ns)
new_test_scaled = pd.DataFrame(scaler.transform(new_test), columns=new_test.columns)

In [64]:
# ensembling of different models
from sklearn.model_selection import cross_val_score
from mlxtend.regressor import StackingRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble import GradientBoostingRegressor
from xgboost.sklearn import XGBRegressor
from sklearn.linear_model import BayesianRidge
params = {'n_estimators': 500, 'max_depth': 4, 'min_samples_split': 2,
'learning_rate': 0.01, 'loss': 'ls'}
ridge = Ridge(random_state=1)
gbreg = GradientBoostingRegressor(**params)
bayridge=BayesianRidge()
xgb = XGBRegressor()
streg = StackingRegressor(regressors=[ridge,gbreg,bayridge],
meta_regressor=xgb)
for clf, label in zip([ridge,gbreg,xgb,bayridge,streg], ['Ridge','GBR','XGB','Bayesian Ri
dge','Ensemble']):
scores = cross_val_score(clf, new_train,target, cv=10, scoring='neg_mean_squared_err
or')
print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))

In [65]:
ridge.fit(new_train,target)
prediction1 = ridge.predict(new_test)

In [66]:
gbreg.fit(new_train,target)
prediction2 = gbreg.predict(new_test)

In [67]:
xgb.fit(new_train,target)
prediction3 = xgb.predict(new_test)

In [68]:
bayridge.fit(new_train,target)
prediction4 = bayridge.predict(new_test)

In [69]:
streg.fit(new_train,target)
prediction5 = streg.predict(new_test)

In [70]:
prediction = (0.3*prediction1+0.3*prediction4+0.25*prediction2+0.15*prediction3)*new_test
.Item_MRP

In [71]:
results = test_data[['Item_Identifier','Outlet_Identifier']]

In [72]:
results.is_copy = False
results.loc[:,'Item_Outlet_Sales'] = prediction

In [73]:
#results.to_csv('submission.csv', sep=',', encoding='utf-8', index=False)

GET ASSURED A++ GRADE IN EACH ICT706 DATA ANALYTICS ASSIGNMENT ORDER - ORDER FOR ORIGINALLY WRITTEN SOLUTIONS!

Access our University of the Sunshine Coast Assignment Help Services for its related courses and academic units such as:-

  • ICT701 Relational Database Systems Assignment Help
  • ICT705 Data and System Integration Assignment Help
  • ICT700 Systems Analysis Assignment Help
  • PRM701 Project Management Principles Assignment Help
  • ICT702 Data Wrangling Assignment Help
  • ICT710 ICT Professional Practice and Ethics Assignment Help
  • ICT703 Network Technology and Management Assignment Help
  • ICT707 Data Science Practice Assignment Help
  • ICT706 Machine Learning Assignment Help
  • ICT704 Cloud Database Systems Assignment Help
Tag This :- EM191050AVN2905PYTH ICT706 Data Analytics Assignment Help

get assignment Quote

Assignment Samples

    Translational Research Assignment Help

    translational research assignment help- The present research work is focused on the topic translational research. The adopted Health concern to discuss in this

Get Academic Excellence with Best Skilled Tutor! Order Assignment Now! Submit Assignment