ICT706 Data Analytics Assignment Help

Attain Premium quality ICT706 Data Analytics Assignment Help and Assessment help Service - Hire Analytics tutors Now!!

Home Course
Previous << || >> Next

GET GUARANTEED SATISFACTION OR MONEY BACK UNDER ICT706 DATA ANALYTICS ASSIGNMENT HELP SERVICES OF EXPERTSMINDS.COM - ORDER TODAY NEW COPY OF THIS ASSIGNMENT!

Research Project:

In this research project you will undertake a data analytics approach to solve a set of business problems that require the use of appropriately selected data processing and mining approaches.

Answer:

In [1]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
import seaborn as sns
from scipy import stats
from scipy.stats import kurtosis
from scipy.stats import skew
from sklearn.model_selection import train_test_split

Part A: Load and Clean Data
Part B: Data Exploration
Part C: Predicting Spending Levels
Part D: Predicting Big Spenders
Part E: Business Recommendations

Write Python code to load your dataset into a Pandas DataFrame called 'sales'.

In [2]:
train_data = pd.read_csv('Train_UWu5bXk.csv',header=0)
test_data = pd.read_csv('Test_u94Q5KV.csv',header=0)

In [3]:
train_data.head(10)

In [4]:
train_data.info()
Out[3]:

In [5]:
train_data.describe()
Out[5]:

In [6]:
test_data['Item_Outlet_Sales'] = 0

In [7]:
df = pd.concat([train_data,test_data])

In [8]:
df.head(10)
Out[8]:

In [9]:
df.shape

In [10]:
df.isnull().sum(axis = 0)
Out[9]:
(14204, 12)
Out[10]:

In [11]:
# Fat content
print(df.Item_Fat_Content.unique())
df.loc[df.Item_Fat_Content.isin(['LF','low fat']), 'Item_Fat_Content'] = 'Low Fat'
df.loc[df.Item_Fat_Content.isin(['reg']), 'Item_Fat_Content'] = 'Regular'
print(df.Item_Fat_Content.value_counts())
print(df.Item_Type.unique())
print(df.groupby('Item_Type')['Item_Fat_Content'].count())
df.loc[df.Item_Type.isin(['Health and Hygiene','Household','Others']), 'Item_Fat_Content'
] = 'None'
print(df.Item_Fat_Content.value_counts())

In [12]:
sns.boxplot(df.Item_Type, df.Item_Weight)
plt.xticks(rotation=45)
plt.show()

In [13]:
sns.boxplot(df.Outlet_Identifier, df.Item_Weight)
plt.xticks(rotation=45)
plt.show()

In [14]:
## Out027 and Out019 dont have any identifier associated with them
## Fill missing values in item weight with particular item identifier mean
weights_mean = df.groupby('Item_Identifier',as_index=False).mean()
print(weights_mean.head(5))

In [15]:
df['Item_Weight'] = df.apply(
lambda row: weights_mean.loc[weights_mean['Item_Identifier']==row['Item_Identifier']
,'Item_Weight'] if np.isnan(row['Item_Weight']) else row['Item_Weight'],
axis=1
)

In [16]:
df.isnull().sum(axis = 0)
df.Item_Weight = df.Item_Weight.astype(float)

In [17]:
sns.boxplot(df.Item_Type, df.Item_Weight)
plt.xticks(rotation=45)
plt.show()

In [18]:
df.info()

In [19]:
sns.boxplot(df.Outlet_Identifier, df.Item_Weight)
plt.xticks(rotation=45)
plt.show()

GET BENEFITTED WITH QUALITY ICT706 DATA ANALYTICS ASSIGNMENT HELP SERVICE OF EXPERTSMINDS.COM

In [20]:
df['year'] = 2013 - df.Outlet_Establishment_Year
df = df.drop(['Outlet_Establishment_Year'],axis=1)

In [21]:
df.columns
Out[21]:
Index(['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility',
'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Size',
'Outlet_Location_Type', 'Outlet_Type', 'Item_Outlet_Sales', 'year'],
dtype='object')

In [22]:
sns.kdeplot(df.Item_MRP,shade=True)
plt.axvline(x=70,color="blue")
plt.axvline(x=137,color="blue")
plt.axvline(x=210,color="blue")
Out[22]:
<matplotlib.lines.Line2D at 0x2e50ee46630>

In [23]:
### There are four different range of prices. Lets introduce a variable MRP level to acco unt for that.
conditions = [
(df['Item_MRP'] < 70),
(df['Item_MRP'] < 137),
(df['Item_MRP'] < 210),
(df['Item_MRP'] >210)]
choices = ['Low', 'Medium', 'High','Very high']
df['MRP_level'] = np.select(conditions, choices)

In [24]:
df.MRP_level.head(10)
Out[24]:
0 Very high
1 Low
2 High
3 High
4 Low
5 Low
6 Low
7 Medium
8 Medium
9 High
Name: MRP_level, dtype: object

In [25]:
### Missing values in outlet_size
df.Outlet_Identifier.value_counts()

In [26]:
### Outlet 10 & 19 have reported far less data than other supermarkets.
### Let's assume its because they are smaller and have lesser goods to offer.
df.groupby('Outlet_Identifier').agg({'Item_Identifier' : len})
Out[25]:
OUT027 1559
OUT013 1553
OUT035 1550
OUT046 1550
OUT049 1550
OUT045 1548
OUT018 1546
OUT017 1543
OUT010 925
OUT019 880
Name: Outlet_Identifier, dtype: int64

In [27]:
### From the above table it is clear that outlet 10 & 19 are smaller and hence have lesse
r#
## number of items as indicated by the length of item identifiers.

In [28]:
### Boxplot of Sales vs Outlet Identifier
sns.boxplot(train_data.Outlet_Identifier,train_data.Item_Outlet_Sales)
plt.xticks(rotation=45)
plt.show()

In [29]:
### Boxplot of Sales vs Outlet Type
sns.boxplot(train_data.Outlet_Type,train_data.Item_Outlet_Sales)
plt.xticks(rotation=45)
plt.show()

In [30]:
# Sales in the one type 2 supermarket appear a bit low.
# Maybe it's because it's still fairly new, having
# been founded 4 years ago.

In [31]:
### Boxplot of Sales vs Outlet Type
ax = sns.boxplot(x="Outlet_Type", y="Item_Outlet_Sales", data=train_data,hue="Outlet_Siz
e")
ax.set_xticklabels(ax.get_xticklabels(),rotation=45)
ax.legend(loc='upper left')
plt.show()

In [32]:
df.columns
In [33]:
othershops = df.groupby(['Outlet_Identifier','Outlet_Type', 'Outlet_Location_Type', 'Out
let_Size']).agg({'Outlet_Size' : len})
othershops = othershops.add_suffix('_Count').reset_index()
Out[32]:
Index(['Item_Identifier', 'Item_Weight', 'Item_Fat_Content', 'Item_Visibility',
'Item_Type', 'Item_MRP', 'Outlet_Identifier', 'Outlet_Size',
'Outlet_Location_Type', 'Outlet_Type', 'Item_Outlet_Sales', 'year',
'MRP_level'],
dtype='object')

In [34]:
### Out10 is small
df['Outlet_Size'] = np.where(df['Outlet_Identifier'] == 'OUT010', 'SMALL', df['Outlet_Si
ze'])

In [35]:
### Boxplot of Sales vs Outlet Location Type
ax = sns.boxplot(x="Outlet_Location_Type", y="Item_Outlet_Sales", data=train_data,hue="O
utlet_Size")
ax.set_xticklabels(ax.get_xticklabels(),rotation=45)
ax.legend(loc='upper left')
plt.show()

In [36]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Item_Type", y="Item_Outlet_Sales", data=train_data,hue="Outlet_Size"
)a
x.set_xticklabels(ax.get_xticklabels(),rotation=90)
ax.legend(loc='upper left')
plt.show()

In [37]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Item_Type", y="Item_Outlet_Sales", data=train_data,hue="Outlet_Type"
)a
x.set_xticklabels(ax.get_xticklabels(),rotation=90)
ax.legend(loc='upper left')
plt.show()

In [38]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Item_Type", y="Item_Visibility", data=df,hue="Outlet_Type")
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
ax.legend(loc='upper left')
plt.show()

In [39]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Outlet_Identifier", y="Item_Visibility", data=df)
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [40]:
# let's have a look at the item identifiers now,
# there are way too many of them.
##
keeping only the first two letters gives us three groups:
# food, drink and non-food
df['Item_class'] = df['Item_Identifier'].str[0:2]

In [41]:
df['Item_class'].value_counts()
Out[41]:

In [42]:
### Keeping the first three letters gives a higher granularity
df['Item_Identifier'] = df['Item_Identifier'].str[0:2]

In [43]:
df['Item_Identifier'].value_counts()
FD 10201
NC 2686
DR 1317
Name: Item_class, dtype: int64
Out[43]:
FD 10201
NC 2686
DR 1317
Name: Item_Identifier, dtype: int64

In [44]:
newdf = df.select_dtypes(exclude=['object'])

In [45]:
corr = newdf.corr()
# plot the heatmap
sns.heatmap(corr,
xticklabels=corr.columns,
yticklabels=corr.columns)

In [46]:
# Scatter plot of Item_Outlet_Sales vs Item_MRP
fg = sns.FacetGrid(data=df, hue='Outlet_Type', aspect=1.61)
fg.map(plt.scatter, 'Item_Outlet_Sales', 'Item_MRP').add_legend()

In [47]:
# Scatter plot of Item_Outlet_Sales vs Item_Visibility
fg = sns.FacetGrid(data=df, hue='Outlet_Type', aspect=1.61)
fg.map(plt.scatter, 'Item_Outlet_Sales', 'Item_Visibility').add_legend()

In [48]:
# Scatter plot of Item_Outlet_Sales vs Item_Visibility
fg = sns.FacetGrid(data=df, hue='Outlet_Size', aspect=1.61)
fg.map(plt.scatter, 'Item_Outlet_Sales', 'Item_Visibility').add_legend()

In [49]:
# Scatter plot of Item_Outlet_Sales vs Item_Visibility
fg = sns.FacetGrid(data=df, hue='Outlet_Identifier', aspect=1.61)
fg.map(plt.scatter, 'Item_Outlet_Sales', 'Item_Visibility').add_legend()

SAVE YOUR HIGHER GRADE WITH ACQUIRING ICT706 DATA ANALYTICS ASSIGNMENT HELP & QUALITY HOMEWORK WRITING SERVICES OF EXPERTSMINDS.COM

In [50]:
### Boxplot of Sales vs Item Type
ax = sns.boxplot(x="Item_Type", y="Item_Outlet_Sales", data=df,hue="Outlet_Type")
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [51]:
### Plenty of Outliers here. We can reduce this by dividing Item_Outlet_Sales by Item_MRP
### Boxplot of Sales vs Item Type
df['Ratio'] = df['Item_Outlet_Sales']/df['Item_MRP']
ax = sns.boxplot(x="Item_Type", y="Ratio", data=df,hue="Outlet_Type")
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [52]:

ax = sns.barplot(x="Item_Type", y="Ratio", data=df,hue="Outlet_Type")
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [53]:
# dividing sales by MRP does reduce the number of outliers
# and also emphasizes the differences between the different
# types of shop
df['Item_Outlet_Sales'] = df['Ratio']
df = df.drop(['Ratio'],axis=1)

In [54]:
# Lets see the ratio of supermarkets to grocery types
df.Outlet_Type.value_counts()
# Although the ratio is too big, a random forest or gbm should be able to deal with this
given enough trees.

In [55]:
# Time to look at the data for each shop separately
def analyze_shop(shop_id):
shopdata = df[df['Outlet_Identifier'].str.contains(shop_id)]
# as Size, location type and type have only one level, we can drop them here
# since the variance of Outlet_Establishment_Year is zero, we
# can also remove that column
shopdata = shopdata.drop(['Outlet_Identifier',
'Outlet_Size',
'Outlet_Location_Type',
'Outlet_Type',
'year'], axis=1)
plt.figure(1)
plt.subplot(221)
sns.distplot(shopdata.Item_Weight)
plt.subplot(222)
sns.distplot(shopdata.Item_Visibility)plt.subplot(223)
sns.distplot(shopdata.Item_MRP)
plt.subplot(224)
sns.distplot(shopdata.Item_Outlet_Sales)
plt.figure(2)
ax = sns.boxplot(x="Item_Type", y="Item_Outlet_Sales", data=shopdata)
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.show()

In [56]:
analyze_shop('OUT018')

In [57]:
# one hot encoding
cols = df.select_dtypes(include=["object"]).columns
df2 = pd.get_dummies(df, columns=cols, drop_first=True)

In [58]:
# let's resurrect the train and test data sets
new_train = df2[1:train_data.shape[0]]
new_test = df2[-test_data.shape[0]:]

In [59]:
print(new_test.shape)
print(new_train.shape)
(5681, 46)
(8522, 46)

In [60]:
target = new_train.Item_Outlet_Sales
new_train= new_train.drop('Item_Outlet_Sales',axis=1)

In [61]:
new_test = new_test.drop('Item_Outlet_Sales',axis=1)

In [62]:
new_train.to_csv('new_train.csv', sep=',', encoding='utf-8')
new_test.to_csv('new_test.csv', sep=',', encoding='utf-8')

In [63]:
# Data scaling
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
new_train_scaled = pd.DataFrame(scaler.fit_transform(new_train), columns=new_train.colum
ns)
new_test_scaled = pd.DataFrame(scaler.transform(new_test), columns=new_test.columns)

In [64]:
# ensembling of different models
from sklearn.model_selection import cross_val_score
from mlxtend.regressor import StackingRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble import GradientBoostingRegressor
from xgboost.sklearn import XGBRegressor
from sklearn.linear_model import BayesianRidge
params = {'n_estimators': 500, 'max_depth': 4, 'min_samples_split': 2,
'learning_rate': 0.01, 'loss': 'ls'}
ridge = Ridge(random_state=1)
gbreg = GradientBoostingRegressor(**params)
bayridge=BayesianRidge()
xgb = XGBRegressor()
streg = StackingRegressor(regressors=[ridge,gbreg,bayridge],
meta_regressor=xgb)
for clf, label in zip([ridge,gbreg,xgb,bayridge,streg], ['Ridge','GBR','XGB','Bayesian Ri
dge','Ensemble']):
scores = cross_val_score(clf, new_train,target, cv=10, scoring='neg_mean_squared_err
or')
print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))

In [65]:
ridge.fit(new_train,target)
prediction1 = ridge.predict(new_test)

In [66]:
gbreg.fit(new_train,target)
prediction2 = gbreg.predict(new_test)

In [67]:
xgb.fit(new_train,target)
prediction3 = xgb.predict(new_test)

In [68]:
bayridge.fit(new_train,target)
prediction4 = bayridge.predict(new_test)

In [69]:
streg.fit(new_train,target)
prediction5 = streg.predict(new_test)

In [70]:
prediction = (0.3*prediction1+0.3*prediction4+0.25*prediction2+0.15*prediction3)*new_test
.Item_MRP

In [71]:
results = test_data[['Item_Identifier','Outlet_Identifier']]

In [72]:
results.is_copy = False
results.loc[:,'Item_Outlet_Sales'] = prediction

In [73]:
#results.to_csv('submission.csv', sep=',', encoding='utf-8', index=False)

GET ASSURED A++ GRADE IN EACH ICT706 DATA ANALYTICS ASSIGNMENT ORDER - ORDER FOR ORIGINALLY WRITTEN SOLUTIONS!

Access our University of the Sunshine Coast Assignment Help Services for its related courses and academic units such as:-

ICT701 Relational Database Systems Assignment Help
ICT705 Data and System Integration Assignment Help
ICT700 Systems Analysis Assignment Help
PRM701 Project Management Principles Assignment Help
ICT702 Data Wrangling Assignment Help
ICT710 ICT Professional Practice and Ethics Assignment Help
ICT703 Network Technology and Management Assignment Help
ICT707 Data Science Practice Assignment Help
ICT706 Machine Learning Assignment Help
ICT704 Cloud Database Systems Assignment Help

Tag This :- EM191050AVN2905PYTH ICT706 Data Analytics Assignment Help

American And Australian Deaf Cultures Assignment Help

american and australian deaf cultures assignment help - The requirement is to compare both deaf cultures. I have compared sign languages, grammar and visual in

Computer and Information Security Assignment Help

computer and information security assignment help - the focus is on how the computer and the information security is important. there is a need to plan.

Taxation Law: Research Assignment Help

The taxation law assignment is based on advising Jane with respect to the tax implications that arises from the facts and figures in relation to 2018/19 year

Financial Statement Analysis Assignment Help

financial statement analysis assignment help - the assignment will include the different aspects of the different valuation methods.

Maintenance Policy Assignment Help

maintenance policy assignment help - Decide a maintenance policy by assigning rules to the repair team about the component

Trifles Essay Assignment Help

trifles essay assignment help - the paper is related to the triffles essay and to analyse a theme within the given paper and highlights requirements of theme.

Social Responsibility Assignment Help

social responsibility assignment help - The theme of discussion is does it pay to be socially responsible? The discussion provided with a detailed explanation.

Excel in your Course

Experts are helping students not just improving grades but also to provide better learning of subject concepts and its problem statements. They are providing you world class assistance which may help you to excel in course or assignments.

Attain Premium quality ICT706 Data Analytics Assignment Help and Assessment help Service - Hire Analytics tutors Now!!

Assignment Samples

Excel in your Course

Leave a Comment [EM191050AVN2905PYTH ICT706 Data Analytics Assignment Help]

Featured Services

Popular Subjects Covered