Heatmap/class_presentation.py at master · JohnAskew/Heatmap · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/usr/bin/env python
# coding: utf-8

# <a id='title'></a>
# # <bold><font color=gray>Jupyter App:</font><font color=darkpink> A Student's Analysis of his Domestic Thermal Footprint</font></bold>
# ## <bold><font color=white>=================</font><font color=purple>Or, how the Raspberry Pi saved me $5,000.</font>

# ## <font size=5 color=darkpink>Abstract:</font>
#
# Concerned with utility costs incurred during summer peak months, this project aimed to explore ways to stabilize and reduce cooling costs inside residential homes. Before making any financial commitments to upgrade cooling equipment, we investigated ways to improve efficiencies.  Lacking current information about the "domestic thermal qualities" or HVAC and insulation efficiencies, we set out to study the impact of the external environment on our home's a/c usage as well as to assess current a/c unit’s cooling abilities. The challenge was to make reliable assessments within a budget. The first objective was to evaluate the a/c cooling unit's effectiveness. The second objective was to determine if the overall goal could be met economically, by making small changes indoors to promote cooling effeciency in a substantial way. <font color=darkpink>**Methods**:</font> Raspberry Pi microcomputers were fitted with sensors to measure temperature and humidity and then stream the data to cloud. The sensors were a collection of DHT22 and DHT11 sensors, each recording temperature and humidity through the duration of the study. Separately, a weather polling service using python was scripted to record the weather information every 10 minutes, in 4 parts of the city, including the area where the residential study was conducted. <font color=darkpink>**Results:**</font> We found the temperature and humidity fluctuates widely from one part of the city to another, and that the humidity was negatively correlated with temperature. Concerning the indoors study, the temperature in the kitchen (M=77.3, S.D.=.75), the living room (M=77.1, SD=.83), the master bedroom (M=73.71, SD=.76) and the guest bedroom (M=79.3, SD=.60))** varied widely. A similar trend was observed with respect to humidity in the kitchen (M=47.1, SD=2.52), living room (M=48.9, SD=2.77), master bedroom (M=54.5, SD=2.02) and guest bedroom (M=46.4, SD=1.70). **Aggregating across all 4 rooms, humidity was weakly negatively correlated with temperature**. The highest temperatures were observed in a guestroom containing multiple electronic devices, including several computer servers, and this same room had a window facing the sun with minimal window covering. Given that the a/c control unit was attempting to achieve one stable temperature throughout the house, and that the coldest room was generally 2 degrees cooler than the remaining rooms, it appears that the ventilation system was overcooling 1 room in order to compensate for undercooling in the hottest room. On a positive note, the range of humidity values collected internally showed the a/c is effective in removing humidity (see heat index). As the temperature in the majority of rooms stayed at within a consistent range near the HVAC thermostat temperature setting, we see the a/c's ability to cool is acceptable..**The correlation between weather and internal environment for temp was 4.6% and the humidity was even less at 3%.** This confirms the effectiveness of the a/c cooling unit. <font color=darkpink>**Conclusions:**</font> A substantial HVAC expenditure is not needed at this time. Restricting the amount of direct sunlight on the hottest room, engaging a dehumidifier near the thermostat, and restricting the ventilation going to the coldest room, will likely balance the rates at which each room cools and reduce the amount of time the a/c is engaged and lead to substatial cost savings.

#
#
# ## <font size=5 color=gray>Notebook Purpose: </font><font color=darkblue>Process and Present Data using Pandas </font><font color=darkblue> including:</font>
# <font color=white size=1>==========================================================================================================================================================================</font>
# <font size=2  color=gray>> using </font><font color=darkblue>**JSON, Lists, Dictionaries, Loops, Dictionary Writer, Z-scores with SciPy, Heat-Maps with Seaborn and Visualizing with MatPlotLib**</font>
# <font color=white size=1>==========================================================================================================================================================================</font>
# <font size=2  color=gray> via ugly colors and links.</font>
# <a href ='#top'>Jump to Table of Contents</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a>
#
# ### <a id='top'></a>
# <font color=black size=1>=============================================================================================================</font>
# ### <font size=5 color=darkpink>Table of Contents</font>
# <font color=black size=1>=============================================================================================================</font>
#
# <a href='#methods'><font color=darkpink>**Methods**</font></a>
#
# <a href='#section1'><font color=darkblue>**Section1:     Process Atlanta weather data**</font></a>
#
# <a href='#sec1pt1'>Part 1 - Read in raw JSON, write out CSV spreadsheet</a>
#
# <a href='#sec1pt2'>Part 2 - Review Weather MetaData</a>
#
# <a href='#sec1pt3'>Part 3 - Data Edits - Format DATES</a>
#
# <a href='#sec1pt4'>Part 4 - Breakout DataFrames by Location</a>
#
# <a href='#sec1pt5'>Part 5 - Visualize Weather Data for Completeness</a>
#
# <a href='#section2'><font color=darkblue>**Section2:     Process Home Environment (Internal Data)**</font></a>
#
# <a href='#sec2pt1'>Part 1 - Read, write and remove dups from Environment Data CSV spreadsheet</a>
#
# <a href='#sec2pt2'>Part 2 - Review Environment MetaData</a>
#
# <a href='#sec2pt3'>Part 3 - Renaming Columns and Replacing Data</a>
#
# <a href='#sec2pt4'>Part 4 - Visualize and Assess Environment Data</a>
#
# <a href='#results'><font color=darkpink>**Results**</font></a>
#
# <a href='#sec3pt1'>Part 1 - Present Weather findings</a>
#
# <a href='#sec3pt2'>Part 2 - Present Environment findings</a>
#
# <a href='#sec3pt3'>Part 3 - Present All Data together</a>
#
# <a href='#conclusions'><font color=darkpink>**Conclusions**</font></a>
#
# <a href='#section4'><font color=darkblue>**Section1:    Comparative Analysis**</font></a>
#
# <a href='#sec4pt2'><font color=darkblue>**Section2:     Conclusions**</font></a>
#
# <a href='#sec4pt3'><font color=darkblue>**Section3:     Limitations and Future Plans**</font></a>

# In[60]:


# Set the my_dir to a path you want to save figures in this notebook.
# if you do not want to save images, simply set my_dir to blanks
# example: my_dir ='' (you just uncomment out this line...)
import os
my_dir = 'C:/users/bucbo_000/desktop/python/images'
#my_dir=''
if os.path.exists(my_dir):
    if os.path.isdir(my_dir):
        os.chdir(my_dir)
else:
    my_dir = ''


# <a id='methods'></a>
# <font color=black size=1>=============================================================================================================</font>
#
#
# <font color=darkpink size=10>Methods</font>
#
# <font color=black size=1>=============================================================================================================</font>
#
# <a href ='#top'>Jump to Table of Contents</a>

# <font color=black>This project is part of an larger, ongoing IOT project. In particular, where the sensors could be monitoring a warehouse to assist in environmental controls.
#
# **Initial development**
#
# The home was chosen a location for proof of concept, with zero cost and complete control over location. The idea was to complete P.O.C. and then build to incorporate additional robust sophistication when implementing in larger or more complex locations where the environment needs managing. Sensors more accurate than the ones we chose were considered cost prohibitive and overly complex for a P.O.C using ARM microcomputers (the Raspberry PI). Eventually this proof of concept functionality will be absorbed by the larger, ongoing IOT project.
#
#
# <font size=10 color=white></font>
# **Locations**
#
# Testing in a home was the most viable location, in terms of cost and usage. At the time of this project, acquiring access to a warehouse remained pending and considered out of reach. Other IOT projects that tied into the larger IOT project were tested and demonstrated in an office complex presentation room. This too, was considered to be out of reach. Concerning the weather, 4 parts of Atlanta were tracked, The north side of the perimeter, Marietta (west), Stone Mountain (east) and downtown at the Coca-Cola Olympic Village (south). The rational was taking a radius between the home and the downtown Capital and extending east and west. Latitude and Longitude were captured as well as weather information, for later use in visualizing the location (if desired), allowing for exact determination of where the weather was being extracted.
#
# **Measures**
#
# The sensors were not calibrated. Upon initial activation of the sensor, the data captured was compared to a true thermostat. Any sensor reporting temperature measurements beyond 5 celsius were discarded and replaced. We found a high correspondence between the accuracy of reported temp and the accuracy of reported humidity. Restating, we found when a sensor reported an unacceptable temperature, we found the humidity was also highly skewed. Take note, the DHT22 sensor, which is more accurate than the DHT11 sensor, requires a 10K resistor to keep the microcomputer port current from impacting the reported signal - the reported temp and humidity. The additional cost was less than .01 (actual cost is 2.00 - as buying resistors in bulk is cheaper than purchasing individually).
#
# <img src="mqtt-pandas.png" width=1000, height=200>
# <img src="image.png">
#
# **Hardware Specifics** compliments of https://www.mouser.com/ds/2/737/dht-932870.pdf and https://www.raspberrypi.org/products/raspberry-pi-3-model-b-plus/
#
#

# # <a id='section1'></a>
# # <font color=darkpink>Section 1: Process Atlanta Weather Data</font>
# <a href ='#top'>Jump to Table of Contents</a>

# <a id='sec1pt1'></a>
# ## <font color=darkblue>Part 1 - Read in raw JSON, write out CSV spreadsheet</font>
#
# <a href ='#top'>Jump to Table of Contents</a>
#
# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Edit check for valid JSON format.</font>
# </div>

# In[2]:


import csv as csv
import json
from pandas.io.json import json_normalize
import matplotlib
import matplotlib.pyplot as plt
import os
import pandas as pd
import seaborn as sns
from scipy import stats
from matplotlib import style
#------------------------------------------------#
# Set processing parameters and directives
#------------------------------------------------#
#matplotlib.use('nbagg')
style.use('fivethirtyeight')
get_ipython().run_line_magic('matplotlib', 'inline')
plt.rcParams['figure.figsize'] = (16,12)
plt.rcParams['font.size'] = 8
#------------------------------------------------#
# Set the appropriate path
#------------------------------------------------#
home_path = 'C:\\users\\bucbo_000\\Desktop'

if os.path.isdir(home_path):
    os.chdir(home_path)
else:
    home_path='./'


#------------------------------------------------#
#Open files for JSON read and write LIST
#------------------------------------------------#
data_file = open('weather_collect.txt', "r", encoding = 'utf-8')

#-------------------------------------------------#
# Convert json to list of dictionaries, then parse accordingly
# IF JSON is INVALID - the json.loads will error. We
# trap the error and simply move onto the next record.
#-------------------------------------------------#

f1 = data_file.readlines()
my_df = list()
for x in f1:
    try:
        data = json.loads(x)
        df = data
        my_df.append(df)
    except:
        print("skipping: JSON format invalid for:", x)


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 2</b> </font><font color=darkblue>: Write valid data to list of dictionaries for consumption by dictionary writer, a child class of cvs module.</font>
# </div>
#

# In[3]:


#------------------------------------------------#
# Define output file and header line
#------------------------------------------------#
with open('weather.csv', 'w', newline='') as csvfile:
    fieldnames = ['sysdate', 'loc', 'temp', 'hum','forecast', 'lat', 'lon']
# This opens the `DictWriter`.
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
#
# Write out the header row (this only needs to be done once!).
    writer.writeheader()
#------------------------------------------------#
# Read in, Write out and Loop
#------------------------------------------------#
    for a_df in my_df:
        try:
            writer.writerow(a_df)
        except:
            print("Unable to parse, skipping record...")


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 3</b> </font><font color=darkblue>: Build DataFrame from list of dictionaries.</font>
# </div>

# In[4]:


ext_df = pd.read_csv(home_path + "\\weather.csv", usecols=range(0,5))


# In[5]:


plt.rcParams['figure.figsize'] = (20,15)
plt.rcParams['font.size'] = 15

a = ext_df.groupby(['loc', 'forecast'])['temp'].mean().unstack().dropna(axis = 0)
x = list(a)
x


slices = [8,8,8,8,8,8,8,8,8,8,8,8,8,8]
activities = x
cols=['m','g','y','b','r','k','pink']

plt.pie(slices, labels=activities
        , startangle=90
        , colors=cols
        , shadow=True
        , explode=(0, 0,0,0,0,0,0,0,0,0,0,0.2,0,0)
#        , autopct = '%1.1f%%'
       )
plt.title('Weather\nExploratory Investigation\nReview Metadata')

if my_dir:
    plt.savefig(my_dir + '/' + 'Review_Exploratory_Metadata.png')


# <a id='sec1pt2'></a>
# ### <font color=darkblue>Part 2 - Review Weather MetaData.</font>
#
# <a href ='#top'>Jump to Table of Contents</a>
#
# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Visualize Metadata and look for holes in data.</font>
# </div>
#
# <font size=3 color=blue>Figure 1a. Displays the raw weather metadata collected to identify if data is missing.</font>

# In[6]:


#----------------------------------------#
# This WILL NOT RUN with NaN records, so...
#    using dropna() function to strip all
#    records that have nulls in them.
# This assessment is based on this data,
#    and should not be considered a best
#    practice.
#----------------------------------------#
for col in ext_df.columns:
    plot_data = ext_df[col].dropna()
    fig, ax = plt.subplots()
    ax.plot(plot_data.values,plot_data.index.values,  label=col)

if my_dir:
    plt.savefig(my_dir + '/' + 'fig_1a_review_weather_metadata_raw.png')


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 2</b> </font><font color=darkblue>: Show DataFrame sample.</font>
# </div>
#
#
# <font size=3 color=blue>Table 1a. displays a random sampling of the weather data collected</font>

# In[7]:


ext_df.sample(n=10)


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 3</b> </font><font color=darkblue>:  Review summary of weather data collected.</font>
# </div>
#
# <font size=3 color=blue>Table 1b. Summarizes the mean temperature and humidity  by location and forecast</font>

# In[8]:


x = ext_df.groupby(['forecast', 'loc'])['temp', 'hum'].mean().unstack().fillna('-')
x


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 4</b> </font><font color=darkblue>: Review raw data Heat Map.</font>
# </div>
#
# <font size=3 color=blue>Figure 1b. Summarizes the raw data as a Heat Map to supplement Table 1b results.</font>
#
# <font size=3 color=purple>Notice that Coca-Cola Olympic Village has the highest diversity in forecasts.</font>

# In[9]:


corr =  ext_df.groupby(['forecast', 'loc'])['temp'].mean().unstack()
ax = sns.heatmap(
    corr,

    cmap=sns.diverging_palette(10, 220, n=200),
    square=True
)
ax.set_xticklabels(
    ax.get_xticklabels(),
    rotation=45,
    horizontalalignment='right'
);

if my_dir:
    plt.savefig(my_dir + '/' + 'fig_1b_heatmap_atlanta_weather.png')


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 5</b> </font><font color=darkblue>: Review content metadata stats.</font>
# </div>
#
# <font size=3 color=blue>Table 1d. displays the weather ***metadata*** statistics</font>

# In[10]:


print("------------------------------------")
print("Weather Data has ", ext_df.shape[0], "Rows and", ext_df.shape[1], "Columns of types:")
print("------------------------------------")
print(ext_df.dtypes)
print(" ")
print("------------------------------------")
print(f"Weather Data nulls search:")
print("------------------------------------")
print(ext_df.isnull().sum())
print(" ")
print()
print("------------------------------------")
print(f"Weather Data counts (raw)")
print("------------------------------------")
for i in ext_df.columns:
    print("The count for", i, "is", ext_df[i].count())
print(" ")
print("------------------------------------")
print(f"Weather Data counts excluding nulls:")
print("------------------------------------")
print(ext_df.dropna().count())
print(" ")
print("------------------------------------")
print(f"Weather Data statistics:")
print("------------------------------------")
ext_df.describe([0])


# <font size=1 color=black>===============================================================================================================</font>
# ## <font color=darkpink>Clean the weather data.</font>
# <font size=1 color=black>===============================================================================================================</font>

# ### <font color=red>Discrepancy!</font><font color=darkgray> The </font><font color=darkorange>weather</font><font color=darkgray> humidity is in </font><font color=darkorange> percentages</font><font color=darkgray> while the </font><font color=blue> indoors </font><font color=darkgray> humidity is in </font><font color=blue>integers</font>

# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 6</b> </font><font color=darkblue>: Convert the humidity before moving on.</font>
# </div>

# In[11]:


ext_df['hum'] = (ext_df['hum'] * 100)


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 7</b> </font><font color=darkblue>: Drop null weather data.</font>
# </div>

# In[12]:


ext_df.dropna(axis=0, inplace=True)


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 8</b> </font><font color=darkblue>: Take sample and verify results.</font>
# </div>
#
# <font size=3 color=blue>Table 1e. Shows a sample of the cleaned weather ***data*** </font>

# In[13]:


ext_df.sample(n=5)


# <a id='sec1pt3'></a>
# ## <font color=darkblue>Part 3 - Format DATES and build indicies.</font>
#
# <a href ='#top'>Jump to Table of Contents</a>
#
# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Format the sysdate column to be a pandas DATE object.</font></div>

# In[14]:


ext_df['sysdate'] = pd.to_datetime(ext_df['sysdate'])


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 2</b> </font><font color=darkblue>: Validate the DATE conversion worked.</font></div>
#
# <font size=3 color=blue>Table 1f. Shows the data types of columns - to validate sysdate is now a datetime object </font>

# In[15]:


ext_df.dtypes


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 3</b> </font><font color=darkblue>: Build index for visualization preparation.</font></div>
#
# <a href ='#top'>Jump to Table of Contents</a>

# In[16]:


ext_df.set_index(['sysdate'], inplace=True)
ext_df.sort_index(inplace=True)


# <a id='sec1pt4'></a>
# ## <font color=darkblue>Part 4 - Breakout DataFrames by Location.</font>
#
# <a href ='#top'>Jump to Table of Contents</a>

# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Create new dataframes by location in city.</font></div>

# In[17]:


ext_atl = ext_df[ext_df['loc'] == 'Atlanta Georgia']
ext_marietta = ext_df[ext_df['loc'] == 'Big Chicken Marietta Georgia']
ext_stonemtn = ext_df[ext_df['loc'] == 'Stone Mountain Park']
ext_coke = ext_df[ext_df['loc'] == 'Coca-Cola Olympic Park']


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 2</b> </font><font color=darkblue>: Take sub-group sample stat to ensure dataframes built correctly.</font></div>
#
# <font size=3 color=blue>Table 1g. Shows a breakout of forecasts.</font>

# In[18]:


ext_atl['forecast'].value_counts()


# <a id='sec1pt5'></a>
# ## <font color=darkblue>Part 5 - Visualize External Data for Completeness.</font>
#
# <a href ='#top'>Jump to Table of Contents</a>

# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Breakout external data by groups.</font></div>

# In[19]:


ext_df.drop(columns = ['forecast']).groupby(['loc']).count()


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 2</b> </font><font color=darkblue>: Merge external data groups.</font></div>

# In[20]:


ow = ext_atl.join(ext_marietta, how='left', lsuffix='_atl', rsuffix='_marietta')
sw = ext_coke.join(ext_stonemtn, how='left', lsuffix='_coke', rsuffix='_stnmtn')
outside = ow.join(sw, how='left')
outside.sort_index(inplace=True)


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 3</b> </font><font color=darkblue>: Do basic plot as a test.</font></div>
#
# ### <font color = blue size=3>Figure 1b - Initial display of weather temp and humidity</font>
#
#  <font color=purple size=3> This actually discloses a lot, but we avoid details, as we stay on track of presentation.</font>

# In[21]:


outside.plot(figsize=(22,11))
plt.xlabel('Dates', color='green')
plt.ylabel('Temperature and Humidity', color='red')
plt.title('External Weather Raw Data\n--------\nTemperature and Humidity', color='blue')

if my_dir:
    plt.savefig(my_dir + '/' + 'fig_1b_weather_temp_hum_raw_data.png')


#
# ### <font color = blue size=3>Figure 1c - Same data as figure 1b, but smoothed with rolling mean</font>

# In[22]:


ext_df['avg_temp'] = ext_atl['temp'].rolling(10).mean().dropna()
ext_df['avg_hum'] = ext_atl['hum'].rolling(10).mean().dropna()
ax = ext_df['avg_hum'].plot()
ax1 = ext_df['avg_temp'].plot(ax = ax, figsize=(22,11))
plt.xlabel('Dates', color='green')
plt.ylabel('Temperature and Humidity', color='red')
plt.title("Derived Mean of External Temperature and Humidity\nSmoothed by Rolling Mean of 10 samples", color='blue')
plt.legend()
if my_dir:
    plt.savefig(my_dir + '/' + 'fig_1c_weather_temp_hum_raw_data.png')


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 4</b> </font><font color=darkblue>: Present external temperatures.</font></div>
#
# ### <font color = blue size=3>Figure 1d - Initial display of weather temp</font>

# In[23]:


outside.drop(columns=['hum_atl', 'hum_coke','hum_stnmtn', 'hum_marietta']).plot(figsize=(22,11))
plt.xlabel('Dates', color='green')
plt.ylabel('Temperature', color='red')
plt.title('Composite of Temperature Data collected from Weather Service', color='blue')

if my_dir:
    plt.savefig(my_dir + '/' + 'fig_1d_weather_temp.png')


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 5</b> </font><font color=darkblue>: Present external humidities.</font></div>
#
# ### <font color = blue size=3>Figure 1e - Initial display of weather humidity</font>

# In[24]:


outside.drop(columns=['temp_atl', 'temp_coke','temp_stnmtn', 'temp_marietta']).plot(figsize=(22,11))
plt.xlabel('Dates', color='green')
plt.ylabel('Humidity', color='red')
plt.title('Composite of Humidity Data collected from Weather Service', color='blue')

if my_dir:
    plt.savefig(my_dir + '/' + 'fig_1e_weather_humidity.png')


# <a id='section2'></a>
# # <font color=darkpink>Section 2: Process Home Environment(Indoor Data)</font>
# <a href ='#top'>Jump to Table of Contents</a>
#

# ### <a id='sec2pt1'></a>
# ## <font color=darkblue>Part 1 - Read, write and remove dups from environment data csv spreadsheet</font>
#
#
# <a href ='#top'>Jump to Table of Contents</a>
#
# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Edit and Build step combined - See Section 1 for step by step details.</font></div>
#
# ### <font color = blue size=3>Table 2a - Shows indoor data that was cleaned off or rejected</font>

# In[25]:


import csv as csv
import datetime
import json
from pandas.io.json import json_normalize
import matplotlib
import matplotlib.pyplot as plt
import os
import pandas as pd
import seaborn as sns

# Set the appropriate path

#------------------------------------------------#
# Set processing parameters and directives
#------------------------------------------------#
matplotlib.use('nbagg')
get_ipython().run_line_magic('matplotlib', 'inline')
plt.rcParams['figure.figsize'] = (16,12)
plt.rcParams['font.size'] = 8
#------------------------------------------------#
# Set the appropriate path
#------------------------------------------------#


home_path = 'C:\\users\\bucbo_000\\Desktop'

if os.path.isdir(home_path):
    os.chdir(home_path)
else:
    home_path='./'

#Open files for read and write - write headings first
data_file = open('mqtt.txt', "r", encoding = 'utf-8')

# Convert json to list of dictionaries, then parse accordingly

f1 = data_file.readlines()
my_df = list()
for x in iter(f1):
    try:
        data = json.loads(x)
        df = data #json_normalize(data)
        my_df.append(df)
    except:
        print("skipping: JSON format invalid for:", x)

cnt_accept = 0
cnt_reject =0
content_df = list()
for i in enumerate(my_df):
     for k,v in enumerate(i):
        #print(i)
        #print("v=", v)
        #if str(v).startswith('{\'device') and "loc" in str(v):
        if "env" not in str(i):
            content_df.append(my_df)
            cnt_accept +=1
        else:
            print("skipping: JSON content invalid")
            cnt_reject +=1

print("We now have", cnt_accept, "records to use after cleaning",cnt_reject, "rows")

cnt_accept = 0
cnt_reject = 0

# Define output file and header line
with open('mqtt.csv', 'w', newline='') as csvfile:
    fieldnames = ['device','sysdate', 'temp', 'hum', 'state']
# This opens the `DictWriter`.
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
#
# Write out the header row (this only needs to be done once!).
    writer.writeheader()
#
# Read in, Write out and Loop
#
    for a_df in my_df:
        try:
            writer.writerow(a_df)
            cnt_accept += 1
        except:
            print("Unable to parse, skipping...")
            cnt_reject += 1

print("We now have", cnt_accept, "Dictionary records to use after cleaning",cnt_reject, "rows")


int_df = pd.read_csv(home_path + "\\mqtt.csv", usecols=range(0,4))

#######################################################
#
int_df = int_df.drop_duplicates()
#
#######################################################

int_df['sysdate'] = pd.to_datetime(int_df['sysdate'])


# <a id='sec2pt2'></a>
# ## <font color=darkblue>Part 2 - Review Environment Metadata</font>
#
# <a href ='#top'>Jump to Table of Contents</a>
#
# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Visualize Metadata and look for holes in data.</font>
# </div>
#
#
# <font size=3 color=blue>Figure 2a. Displays the raw indoor metadata collected to identify if data is missing.</font>

# In[26]:


for col in int_df.columns:
    plot_data = int_df[col].dropna()
    fig, ax = plt.subplots()
    ax.plot(plot_data.index.values, plot_data.values, label=col)
    plt.ylabel(col)
    plt.title("MetaData Dump\nfor review of missing data")

if my_dir:
    plt.savefig(my_dir + '/' + 'fig_2a_indoor_raw_metadata_visual.png')


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 2</b> </font><font color=darkblue>: Review raw Metadata statistics.</font>
# </div>

# In[27]:


print("------------------------------------")
print("Environment Data has ", int_df.shape[0], "Rows and", int_df.shape[1], "Columns of types:")
print("------------------------------------")
print(int_df.dtypes)
print(" ")
print("------------------------------------")
print(f"Environment Data nulls search:")
print("------------------------------------")
print(int_df.isnull().sum())
print(" ")
print("------------------------------------")
print(f"Environment Data counts (raw)")
print("------------------------------------")
for i in int_df.columns:
    print("The count for", i, "is", int_df[i].count())
print("------------------------------------")
print(f"Environment Data counts:")
print("------------------------------------")
print(int_df.dropna().count())
print(" ")
print("------------------------------------")
print(f"Environment Data statistics:")
print("------------------------------------")
int_df.describe()
print(f"device cardinality:", int_df['device'].value_counts())
print(f"temp mean:", int_df['temp'].value_counts().mean())
print(f"humidity mean:", int_df['hum'].value_counts().mean())


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 3</b> </font><font color=darkblue>: Review indoor temp for outliers.</font>
# </div>
#
# ### <font color = blue size=3>Figure 2b - Initial display of indoor temp</font>

# In[28]:


int_df['temp'].plot(color='lightblue')
new_df = int_df.copy()
plt.xlabel('Dates', color='green')
plt.ylabel('Temperature', color='red')
plt.title('Indoor Temperature Raw Data', color='blue')
plt.legend()

if my_dir:
    plt.savefig(my_dir + '/' + 'fig_2b_indoor_temp_raw.png')


# ### <font color = blue size=3>Figure 2c - Same data as Figure 2b - but presented using rolling mean</font>

# In[29]:


int_df['temp'].rolling(10).mean().plot(color='lightblue')
new_df = int_df.copy()
plt.xlabel('Dates', color='green')
plt.ylabel('Temperature', color='red')
plt.title('Indoor Temperature Raw Data\nSmoothed using Rolling Mean of 10 samples', color='blue')
plt.legend()

if my_dir:
    plt.savefig(my_dir + '/' + 'fig_2c_indoor_temp_raw.png')


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 4</b> </font><font color=darkblue>: Review summary of indoor data collected.</font>
# </div>
#
# <font size=3 color=blue>Table 2b. Displays the temperature and humidity  by room.</font>

# In[30]:


import matplotlib.pyplot as plt
from matplotlib import style

style.use('ggplot')

get_ipython().run_line_magic('matplotlib', 'inline')
plt.rcParams['figure.figsize'] = (16,12)
plt.rcParams['font.size'] = 18

x = int_df.replace({'device': {'B8:27:EB:76:5F:45': 'Living Room', 'B8:27:EB:2D:40:28': 'Master Bedroom', 'B8:27:EB:37:B0:F8': 'Guest Bedroom', 'B8:27:EB:A9:D4:C2': 'Kitchen', 'DC:0:D30:48:FE:9C':'Master Bedroom'}}).groupby(['temp', 'device'])['hum'].min().unstack().fillna('-')
y = int_df.replace({'device': {'B8:27:EB:76:5F:45': 'Living Room', 'B8:27:EB:2D:40:28': 'Master Bedroom', 'B8:27:EB:37:B0:F8': 'Guest Bedroom', 'B8:27:EB:A9:D4:C2': 'Kitchen', 'DC:0:D30:48:FE:9C':'Master Bedroom'}}).groupby(['temp', 'device'])['hum'].max().unstack().fillna('-')
x  = x.join(y, how='left', lsuffix="_min", rsuffix='_max')
print(x)


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 5</b> </font><font color=darkblue>: Clean up outliers.</font>
# </div>
#
# ## <font color=blue>Notice</font><font color=purple> the wide swing in </font><font color=red>temp </font><font color=purple> ranges. Let's use Z-Score and clean up the temp outliers.

# In[31]:


print(abs(stats.zscore(int_df['temp'])).min())
print(abs(stats.zscore(int_df['temp'])).mean())
print(abs(stats.zscore(int_df['temp'])).std())
print(abs(stats.zscore(int_df['temp'])).max())
int_df = int_df[abs(stats.zscore(int_df['temp'])) < 2.0]


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 6</b> </font><font color=darkblue>: Review new data.</font>
# </div>
#
# <font size=3 color=blue>Table 2c. Displays the temperature and humidity statistics.</font>

# In[32]:


print("------------------------------------")
print("Environment Data statistics:")
print("------------------------------------")
int_df.describe()


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 7</b> </font><font color=darkblue>: Clean up outliers.</font>
# </div>
#
# ## <font color=blue>Notice</font><font color=purple> the wide swing in </font><font color=red>hum </font><font color=purple> ranges. Let's use Z-Score and clean up the hum outliers.

# In[33]:


print(abs(stats.zscore(int_df['hum'])).min())
print(abs(stats.zscore(int_df['hum'])).mean())
print(abs(stats.zscore(int_df['hum'])).std())
print(abs(stats.zscore(int_df['hum'])).max())
int_df = int_df[abs(stats.zscore(int_df['hum'])) < 2.0 ]


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 8</b> </font><font color=darkblue>: Review new data.</font>
# </div>
#
# <font size=3 color=blue>Table 2d. Displays the **cleaned** temperature and humidity statistics.</font>

# In[34]:


print("------------------------------------")
print("Environment Data statistics:")
print("------------------------------------")
int_df.describe()


# <a id='sec2pt3'></a>
# ## <font color=darkblue>Part 3 - Rename columns, replace data and build index.</font>
#
# <a href ='#top'>Jump to Table of Contents</a>
#
# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Change devices to match room location.</font>
# </div>

# In[35]:


int_df.replace({'device': {'B8:27:EB:76:5F:45': 'Living Room', 'B8:27:EB:2D:40:28': 'Master Bedroom', 'B8:27:EB:37:B0:F8': 'Guest Bedroom', 'B8:27:EB:A9:D4:C2': 'Kitchen', 'DC:0:D30:48:FE:9C':'Master Bedroom'}}, inplace=True)


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 2</b> </font><font color=darkblue>: COLUMN renaming step to make columns meaningful.</font>
# </div>

# In[36]:


#int_df.rename(columns={'state': 'forecast'},inplace=True)
int_df.rename(columns={'device': 'room'}, inplace=True)


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 3</b> </font><font color=darkblue>: Set sysdate as index for internal data.</font>
# </div>

# In[37]:


int_df.set_index(['sysdate'], inplace=True)
int_df.sort_index(inplace=True)


# <a id='sec2pt4'></a>
# ## <font color=darkblue>Part 4 - Breakout DataFrames by location.</font>
#
# <a href ='#top'>Jump to Table of Contents</a>

# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Create new dataframes by device or internal location.</font>
# </div>
#
# ### <font color=purple>As this is taken during the summer, we can assume any temp below 70 F is to be removed.</font>

# In[38]:


mb = int_df[int_df['room'] == 'Master Bedroom']
lr = int_df[int_df['room'] == 'Living Room']
gb = int_df[int_df['room'] == 'Guest Bedroom']
kt = int_df[int_df['room'] == 'Kitchen']
#---------------------------------------------#
# Deprecated with Z-scoring above
#---------------------------------------------#
#mb = mb[mb['temp'] > 69]
#mb = mb[mb['temp'] < 90]
#lr = lr[lr['temp'] > 69]
#lr = lr[lr['temp'] < 90]
#gb = gb[gb['temp'] > 69]
#gb = gb[gb['temp'] < 90]
#kt = kt[kt['temp'] > 69]
#kt = kt[kt['temp'] < 90]


# <a id='sec2pt5'></a>
# ## <font color=darkblue>Part 5 - Visualize and Assess Environment Data.</font>
#
# <a href ='#top'>Jump to Table of Contents</a>

# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 1</b> </font><font color=darkblue>: Display data counts by device.</font>
# </div>
#
# <font size=3 color=blue>Table 2e. Displays the temperature and humidity counts by room</font>
#
# <font size=3 color=blue>Table 2f. Displays the temperature and humidity mean by room</font>

# In[39]:


int_df.groupby(['room']).count()


# In[40]:


int_df.groupby(['room']).mean()


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 2</b> </font><font color=darkblue>: Visualize the internal data's metadata.</font>
# </div>
#
# ### <font color = blue size=3>Figure 2c - Review environment metadata to assess if missing we are missing data.</font>

# In[ ]:


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 3</b> </font><font color=darkblue>: Merge internal data groups.</font>
# </div>

# In[41]:


br = mb.join(gb, how='left',  lsuffix='_mstr', rsuffix='_guest')
lk = lr.join(kt, how='left', lsuffix='_living', rsuffix='_kit')
inside = br.join(lk, how='left')


# <div class="alert alert-block alert-warning">
#     <b><font color=black>step 4</b> </font><font color=darkblue>: Visualize internal data.</font>
# </div>
#
# ### <font color = blue size=3>Table 2d - Cleaner environment temp with humidity</font>

# In[42]:


inside.plot()
plt.xlabel('Dates', color='green')
plt.ylabel('Temperature and Humidity', color='red')