-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathSensor_Data_Analysis.Rmd
More file actions
518 lines (437 loc) · 22.2 KB
/
Copy pathSensor_Data_Analysis.Rmd
File metadata and controls
518 lines (437 loc) · 22.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
---
title: "Asset Mapping Sensor Data Analysis"
url: "https://www.assetmapping.com"
author: "Nathan Sowatskey - https://github.com/NathanDotTo, https://www.kaggle.com/nathanto"
output: html_document
---
```{r load_libraries, echo = FALSE, warning = FALSE, message = FALSE}
require("plotly")
```
# Introduction
This is a study of time series data provided by [Asset Mapping](https://www.assetmapping.com). The data is analysed below, and is revealed to be a series of readings of the following main attributes:
- CO2, in percentage of atmosphere.
- Temperature, in degrees centigrade.
- Humidity, in units that vary for different sensor.
- Noise average, in decibels.
- Noise peak in decibels.
- Battery voltage.
## The Question
One of the most important environmental measures is of CO2, so this study will focus on whether CO2 can be predicted from the other variables. In this, there are two main relationships that might hold:
- CO2 is predicted by Humidity and Temperature.
- CO2 is predicted by Humidity and Temperature, and indirectly by noise, where noise as a proxy for other activities that might increase CO2, such as air conditioning or people.
# Data
The data source is a compressed SQL output file in the data directory. The format of the data is illustrated by this example:
```
object_name | object_name | time_stamp | value
------------------------+-------------+----------------------------+-------
London 70bf78cf (None) | NoiseAvg | 2017-02-02 15:16:04 | 50.9
London 9c850e0a (None) | NoisePeak | 2017-03-22 03:46:53 | 36.4
```
The data starts at line 3. Observation fields are delimted by a `|`. The first column is `IC_meter`. The second column is `value_type`, which we analyse below. The `time_stamp` and `value` columns are what they are.
TODO: Query data from source with SQL.
## Importing Data
```{r import_sensor_data}
#Import all source files in the data directory and store them in a matrix, where the row name is the file name
data_files <- list.files(path = "data", pattern = "\\.sql\\.gz$")
#The columns are the source data set, and the unique lists of meters and value_types in that data set
sensors_df_mtrx <- matrix(list(), nrow = length(data_files), ncol = 3)
rownames(sensors_df_mtrx) <- data_files
#Set column names explicitly as the column headings in the source are not what we need
column_names <- c("IC_meter", "value_type", "time_stamp", "value")
row <- 1
for (file in data_files) {
#Import data, skipping header rows and without factors for now, so that the data types can be more easily converted
sensors_df_mtrx[[row, 1]] <- read.delim(paste("data/", file, sep = ""), sep="|", skip = 2, col.names = column_names, header = FALSE, stringsAsFactors = FALSE, strip.white = TRUE)
#Delete last row if the values are null.
nrow_sensor_df <- nrow(sensors_df_mtrx[[row, 1]])
if (sensors_df_mtrx[[row, 1]][nrow_sensor_df, 2] == "") {
sensors_df_mtrx[[row, 1]] <- sensors_df_mtrx[[row, 1]][-nrow_sensor_df,]
}
#Convert time_stamp string to POSIXct, note that the milliseconds are retained with the %OS format. To see milliseconds when printing, use:
#op <- options(digits.secs=6)
sensors_df_mtrx[[row, 1]]$time_stamp <- as.POSIXct(sensors_df_mtrx[[row, 1]]$time_stamp, tz = "UTC", format = "%Y-%m-%d %H:%M:%OS", usetz = TRUE)
#Remove spaces from meter names
sensors_df_mtrx[[row, 1]]$IC_meter <- gsub(" ", "_", sensors_df_mtrx[[row, 1]]$IC_meter)
row <- row + 1
}
```
## Subset Data
```{r subset_data}
#Process sensors_df_mtrx to subset each source data frame into dataframes for each meter,
#and then replace the source data with the matrix of dataframe subsets
for (row in 1:nrow(sensors_df_mtrx)) {
sensor_df <- sensors_df_mtrx[[row, 1]]
ic_meters <- unique(sensor_df$IC_meter)
value_types <- unique(sensor_df$value_type)
#The matrix will hold a dataframe subset for each IC_meter and value_type, with the name of each row being the IC_meter, and the column the value_type
meter_value_type_subsets_mtrx <- matrix(list(), nrow = length(ic_meters), ncol = length(value_types))
rownames(meter_value_type_subsets_mtrx) <- ic_meters
colnames(meter_value_type_subsets_mtrx) <- value_types
col_counter <- 1
row_counter <- 1
for (icm in ic_meters) {
for (vt in value_types) {
meter_value_type_subsets_mtrx[[row_counter, col_counter]] <- subset(sensor_df, (IC_meter == icm & value_type == vt), select = -c(IC_meter, value_type))
col_counter <- col_counter + 1
}
col_counter <- 1
row_counter <- row_counter + 1
}
#Replace the original dataframe with the matrix of dataframe subsets
sensors_df_mtrx[[row, 1]] <- meter_value_type_subsets_mtrx
#Record the unique meter names and value_types
sensors_df_mtrx[[row, 2]] <- ic_meters
sensors_df_mtrx[[row, 3]] <- value_types
}
```
# Data Exploration
In the data exploration below, it can be seen that different sensors report different value ranges for "Humidity". For example, "Water Sensor 2" has values as low as 5 and as high as 227, or more. The "Sensor 2", though, has values ranging from below 25 to nearly 65. These are possibly not the same units.
Also, "Water Sensor 3" shows readings from almost 0 to above 50, but with only a few data points.
These variations in data suggest that we need to ensure that we are careful to compare only like with like, from the same sensors.
We can also see that, for a given sensor, that the available data do not all come from the same time range. Importantly, this means that just reviewing the plots of the original data visually will not easily reveal any relationships. We will need to normalise for consistent time ranges.
Finally, we can see that, if we want to model for CO2 predicted by the other variables, we only have two sensor data sets that have readings for CO2, and Humdity, Temperature and Noise, so we need to restrict our modelling to data from those sensors, which are:
- London_70bf78cf, from 2_sensors_Dec_11_2016-Mar_26_2017.sql.gz.
- London_9c850e0a, from 2_sensors_Dec_11_2016-Mar_26_2017.sql.gz.
- London_70bf78cf, from 2_sensors_sensor_type_1.sql.gz.
- London_9c850e0a, from 2_sensors_sensor_type_1.sql.gz.
Of these, the readings from 2_sensors_sensor_type_1.sql.gz seem to be the most complete. These are the observations we shall use as the basis for modelling, which is presented after the data analysis graphs below.
```{r plotly_lines, warning = FALSE}
line_plot_gatherer <- htmltools::tagList()
plot_counter <- 0
for (row in 1:nrow(sensors_df_mtrx)) {
#Add the name of the source file
line_plot_gatherer[[(plot_counter <- plot_counter + 1)]] <- htmltools::h1(paste("Source file -", rownames(sensors_df_mtrx)[row]))
meter_value_type_subsets_mtrx <- sensors_df_mtrx[[row, 1]]
ic_meters <- sensors_df_mtrx[[row, 2]]
value_types <- sensors_df_mtrx[[row, 3]]
col_counter <- 1
row_counter <- 1
for (icm in ic_meters) {
for (vt in value_types) {
plot_df <- meter_value_type_subsets_mtrx[[row_counter, col_counter]]
if (nrow(plot_df) > 0) {
line_plot_gatherer[[(plot_counter <- plot_counter + 1)]] <- plot_ly(plot_df, x = ~time_stamp, y = ~value, type = "scatter", mode = "lines") %>%
layout(title = paste("Sensor -", icm, "- Value Type -", vt))
#Put some space between the plots
line_plot_gatherer[[(plot_counter <- plot_counter + 1)]] <- htmltools::h1("")
}
col_counter <- col_counter + 1
}
col_counter <- 1
row_counter <- row_counter + 1
}
}
line_plot_gatherer
```
# Modelling Analysis
Based on the data analysis above. we are going to focus the modelling analysis on these portions of the data:
- London_70bf78cf, from 2_sensors_sensor_type_1.sql.gz.
- London_9c850e0a, from 2_sensors_sensor_type_1.sql.gz.
The first step is to organise the data in a format such that we can associate readings, for the dependent variable, CO2, with readings for the, potential, explanatory variables of Temperature, Humidity and Noise Average, into a single observation. To do this, we need to choose readings for all variables that are within the same time period, and create observation sets from these.
## Arranging Observations
First we select the data subsets that we want to focus on:
```{r subset_data_for_model}
source_file <- "2_sensors_sensor_type_1.sql.gz"
sensors <- c("London_70bf78cf", "London_9c850e0a")
model_df_mtrx <- matrix(list(), nrow = length(sensors), 5)
for (row in 1:nrow(sensors_df_mtrx)) {
#Select the specific source file
if (rownames(sensors_df_mtrx)[row] != source_file) {
next
}
meter_value_type_subsets_mtrx <- sensors_df_mtrx[[row, 1]]
ic_meters <- sensors_df_mtrx[[row, 2]]
value_types <- sensors_df_mtrx[[row, 3]]
col_counter <- 1
src_row_counter <- 1
dest_row_counter <- 1
row_names <- c()
col_names <- c()
for (icm in ic_meters) {
if (!(icm %in% sensors)) {
src_row_counter <- src_row_counter + 1
next
}
row_names <- c(row_names, icm)
for (vt in value_types) {
model_df <- meter_value_type_subsets_mtrx[[src_row_counter, col_counter]]
if (nrow(model_df) > 0) {
#Sort the data by time_stamp as we store it
model_df_mtrx[[dest_row_counter, col_counter]] <- model_df[order(as.Date(model_df$time_stamp, format = "%Y-%m-%d %H:%M:%OS")),]
if(dest_row_counter == 1) {
col_names <- c(col_names, vt)
}
}
col_counter <- col_counter + 1
}
col_counter <- 1
src_row_counter <- src_row_counter + 1
dest_row_counter <- dest_row_counter + 1
}
rownames(model_df_mtrx) <- row_names
colnames(model_df_mtrx) <- col_names
}
```
Next we need to find the data range for all observations that overlap.
```{r date_range}
sensors <- rownames(model_df_mtrx)
value_types <- colnames(model_df_mtrx)
for (row in 1:nrow(model_df_mtrx)) {
col_counter <- 1
row_counter <- 1
low_date <- as.POSIXct("0000-1-1 00:00:00", format = "%Y-%m-%d %H:%M:%OS")
#Should be good for a while
high_date <- as.POSIXct("9999-1-1 00:00:00", format = "%Y-%m-%d %H:%M:%OS")
for (sensor in sensors) {
for (vt in value_types) {
sensor_vt_df <- model_df_mtrx[[row_counter, col_counter]]
#The highest of the lowest dates will be the bottom of the common date range,
#and the lowest of the highest dates will be the top of the common date range.
#Note that the data is sorted by time_stamp.
if (low_date < as.POSIXct(sensor_vt_df[1, 1], format = "%Y-%m-%d %H:%M:%OS")) {
low_date <- as.POSIXct(sensor_vt_df[1, 1], format = "%Y-%m-%d %H:%M:%OS")
}
if (high_date > as.POSIXct(sensor_vt_df[nrow(sensor_vt_df), 1])) {
high_date <- as.POSIXct(sensor_vt_df[nrow(sensor_vt_df), 1], format = "%Y-%m-%d %H:%M:%OS")
}
col_counter <- col_counter + 1
}
col_counter <- 1
row_counter <- row_counter + 1
}
}
```
Combine all of the data into a single data frame, between the date ranges.
```{r combine_data}
value_types <- colnames(model_df_mtrx)
all_observations_df <- read.table(text="", col.names = value_types)
for (row in 1:nrow(model_df_mtrx)) {
```
Plot the subset of data on one graph to see if there are any obvious correlations, noting that the units are different for all reading types.
```{r plotly_lines_correlations, warning = FALSE}
for (row in 1:nrow(model_df_mtrx)) {
model_df_mtrx_row_counter <- 1
#Each column in the model_df_mtrx is a value type
model_df_mtrx_col_counter <- 1
#For each value type for a given sensor
for (vt in value_types) {
#Get the data frame for that value type
sensor_vt_df <- model_df_mtrx[[model_df_mtrx_row_counter, model_df_mtrx_col_counter]]
sensor_vt_df <- subset(sensor_vt_df, sensor_vt_df$time_stamp >= low_date & sensor_vt_df$time_stamp <= high_date)
plot <- plot_ly(sensor_vt_df, x = ~time_stamp)
p <- plot_ly(data, x = ~x) %>%
add_trace(y = ~trace_0, name = 'trace 0',mode = 'lines') %>%
add_trace(y = ~trace_1, name = 'trace 1', mode = 'lines+markers') %>%
add_trace(y = ~trace_2, name = 'trace 2', mode = 'markers')
}
}
plot
```
Given the date range between `r low_date` and `r high_date` we can trim the data sets down as we reformat the data into observations.
What we need to do here is:
- Start from the `r low_date`, up to `r high_date`
- Put all observations for all value types into time `buckets` where the value of the reading for the time bucket is a mean of readings in that time bucket
- Gather all observations for the same time buckets into a single observation set for the given sensor/meter
```{r trim_data_sets}
#Create a set of time buckets, in minutes, into which we want to gather the data.
#With a range of time buckets we can analyse the sensitivity of the predictive capability of the model for different time buckets, in minutes
time_buckets <- c(1, 3, 5, 10, 20, 40, 60, 120, 180, 240) #Increase with buckets up to 60 min or more
#Create a matrix to hold sets of observations for different time buckets
all_observations_mtrx <- matrix(list(), nrow = length(time_buckets), ncol = 1)
#This has been defined already, but doing so again here for readability.
value_types <- colnames(model_df_mtrx)
all_obs_row_counter <- 1
for (time_bucket in time_buckets) {
start_time <- Sys.time()
#Create a data frame for all observations for all value types for a given time bucket
time_bucket_observations_df <- read.table(text="", col.names = value_types)
time_bucket_secs = time_bucket * 60
#Each row in model_df_mtrx is a sensor
for (row in 1:nrow(model_df_mtrx)) {
model_df_mtrx_row_counter <- 1
#Each column in the model_df_mtrx is a value type
model_df_mtrx_col_counter <- 1
#For each value type for a given sensor
for (vt in value_types) {
#Get the data frame for that value type
sensor_vt_df <- model_df_mtrx[[model_df_mtrx_row_counter, model_df_mtrx_col_counter]]
#Subset dataframe to between the low_date and high_date
sensor_vt_df <- subset(sensor_vt_df, sensor_vt_df$time_stamp >= low_date & sensor_vt_df$time_stamp <= high_date)
#The counter for the mean values observation row, which is reset for each value type/column
obs_row_counter <- 1
#Set the bucket to start at the low date
bucket_start <- low_date
#Go through the data selecting observations that fit into the time bucket
while (bucket_start + time_bucket_secs < high_date) {
#Subset to observations in time_bucket, note that the upper range is less than but not equal
time_bucket_vt_df <- subset(sensor_vt_df, sensor_vt_df$time_stamp >= bucket_start & sensor_vt_df$time_stamp < bucket_start + time_bucket_secs)
#Record the mean observation from the subset of observations in the time bucket, else record 0 if there are no observations
if (nrow(time_bucket_vt_df) > 0) {
time_bucket_observations_df[obs_row_counter, model_df_mtrx_col_counter] <- mean(time_bucket_vt_df$value)
} else {
time_bucket_observations_df[obs_row_counter, model_df_mtrx_col_counter] <- 0
}
obs_row_counter <- obs_row_counter + 1
#Increment the bucket by the amount of seconds
bucket_start <- bucket_start + time_bucket_secs
}
#Move on to the next column/value type
model_df_mtrx_col_counter <- model_df_mtrx_col_counter + 1
}
#Move on to the next row/sensor
model_df_mtrx_row_counter <- model_df_mtrx_row_counter + 1
}
all_observations_mtrx[[all_obs_row_counter, 1]] <- time_bucket_observations_df
all_obs_row_counter <- all_obs_row_counter + 1
print(paste("Time bucket of", time_bucket, "took", as.numeric(Sys.time()-start_time, units="secs"), "seconds, with", nrow(time_bucket_observations_df), "observations."))
}
```
## Analysing Correlations
Given the models gathered in `all_observations_mtrx` we can now look for correlations for which variables explain `CO2`.
```{r analyse_model_correlations}
#Loop through the data frames in all_observations_mtrx to examine possible correlations
for (obs_row in 1:nrow(all_observations_mtrx)) {
print(paste("Model for time bucket of", time_buckets[obs_row], "minutes."))
allmodels <- step(lm(data = all_observations_mtrx[[obs_row, 1]], CO2 ~ .), trace=0)
summary_all <- summary(allmodels)
print(summary_all)
print("--")
}
```
In general, the results show that CO2 is predicted by Humidity, Temperature and Noise.
The results show that models for larger time buckets have a greater R-squared value.
## Parsimonious Models
Here we examine whether more parsimonious models are possible.
```{r Temperature_model}
#Loop through the data frames in all_observations_mtrx to examine possible correlations
for (obs_row in 1:nrow(all_observations_mtrx)) {
print(paste("Model for time bucket of", time_buckets[obs_row], "minutes."))
allmodels <- step(lm(data = all_observations_mtrx[[obs_row, 1]], CO2 ~ Temperature), trace=0)
summary_all <- summary(allmodels)
print(summary_all)
print("--")
}
```
```{r Humidity_model}
#Loop through the data frames in all_observations_mtrx to examine possible correlations
for (obs_row in 1:nrow(all_observations_mtrx)) {
print(paste("Model for time bucket of", time_buckets[obs_row], "minutes."))
allmodels <- step(lm(data = all_observations_mtrx[[obs_row, 1]], CO2 ~ Humidity), trace=0)
summary_all <- summary(allmodels)
print(summary_all)
print("--")
}
```
```{r Temperature_Humidity_model}
#Loop through the data frames in all_observations_mtrx to examine possible correlations
for (obs_row in 1:nrow(all_observations_mtrx)) {
print(paste("Model for time bucket of", time_buckets[obs_row], "minutes."))
allmodels <- step(lm(data = all_observations_mtrx[[obs_row, 1]], CO2 ~ Humidity + Temperature), trace=0)
summary_all <- summary(allmodels)
print(summary_all)
print("--")
}
```
```{r NoiseAvg_model}
#Loop through the data frames in all_observations_mtrx to examine possible correlations
for (obs_row in 1:nrow(all_observations_mtrx)) {
print(paste("Model for time bucket of", time_buckets[obs_row], "minutes."))
allmodels <- step(lm(data = all_observations_mtrx[[obs_row, 1]], CO2 ~ NoiseAvg), trace=0)
summary_all <- summary(allmodels)
print(summary_all)
print("--")
}
```
```{r NoisePeak_model}
#Loop through the data frames in all_observations_mtrx to examine possible correlations
for (obs_row in 1:nrow(all_observations_mtrx)) {
print(paste("Model for time bucket of", time_buckets[obs_row], "minutes."))
allmodels <- step(lm(data = all_observations_mtrx[[obs_row, 1]], CO2 ~ NoisePeak), trace=0)
summary_all <- summary(allmodels)
print(summary_all)
print("--")
}
```
```{r Temperature_Humidity_NoiseAvg_model}
#Loop through the data frames in all_observations_mtrx to examine possible correlations
for (obs_row in 1:nrow(all_observations_mtrx)) {
print(paste("Model for time bucket of", time_buckets[obs_row], "minutes."))
allmodels <- step(lm(data = all_observations_mtrx[[obs_row, 1]], CO2 ~ Humidity + Temperature + NoiseAvg), trace=0)
summary_all <- summary(allmodels)
print(summary_all)
print("--")
}
```
```{r Temperature_Humidity_NoisePeak_model}
#Loop through the data frames in all_observations_mtrx to examine possible correlations
for (obs_row in 1:nrow(all_observations_mtrx)) {
print(paste("Model for time bucket of", time_buckets[obs_row], "minutes."))
allmodels <- step(lm(data = all_observations_mtrx[[obs_row, 1]], CO2 ~ Humidity + Temperature + NoisePeak), trace=0)
summary_all <- summary(allmodels)
print(summary_all)
print("--")
}
```
# OpenTSDB Format
This section writes data for import into [OpenTSDB](http://opentsdb.net), which could then be used with [Grafana](https://grafana.com/), for example.
The data format, from [here](http://opentsdb.net/docs/build/html/user_guide/cli/import.html) is:
```
<metric> <timestamp> <value> <tagk=tagv> [<tagkN=tagvN>]
```
The data has to be made available on a file system that the OpenTSDB `import` command can see, and can then be imported like this:
```
for f in `ls *.tsdb`; do echo $f; sudo /usr/share/opentsdb/bin/tsdb import $f --auto-metric=true; done
```
```{r write_opentsdb_format}
invisible(file.remove(file.path("tsdb", list.files("tsdb"))))
dir.create("tsdb", showWarnings = FALSE)
for (row in 1:nrow(sensors_df_mtrx)) {
meter_value_type_subsets_mtrx <- sensors_df_mtrx[[row, 1]]
ic_meters <- sensors_df_mtrx[[row, 2]]
value_types <- sensors_df_mtrx[[row, 3]]
col_counter <- 1
row_counter <- 1
for (icm in ic_meters) {
for (vt in value_types) {
export_df <- meter_value_type_subsets_mtrx[[row_counter, col_counter]]
if (nrow(export_df) == 0) {
next
}
#Remove spaces
icm <- gsub(" ", "_", icm)
vt <- gsub(" ", "_", vt)
#Convert parantheses to hyphens
icm <- gsub("\\(", "-", icm)
icm <- gsub("\\)", "-", icm)
#Add the meter and the value type as the metric column at the beginning of the dataframe to help ensure uniqueness
metric = paste(icm, vt, sep = ".")
if (metric == "London_7fb2bee9.NoiseAvg") {
View(export_df)
}
export_df <- cbind(comp_metric = metric, export_df)
#Add the meter as a tag at the end of the dataframe
export_df$tag <- paste("meter", icm, sep = "=")
#Change timestamp to milliseconds
export_df$time_stamp <- as.numeric(as.POSIXlt(export_df$time_stamp, tz='UTC')) * 1000
#Then restrict to 13 characters
export_df$time_stamp <- strtrim(as.character(export_df$time_stamp), 13)
#Turn off scientific formatting, in effect
options(scipen=500)
#Make sure that all observations are unique
export_df <- unique(export_df)
#Order data in ascending order of time_stamp
export_df <- export_df[order(export_df$time_stamp),]
#Export data
export_file_name <- paste("tsdb/", icm, ".", vt, ".tsdb", sep = "")
#TODO use HTTP API to import data, though the batch import would probably be better for bulk data
write.table(export_df, file = export_file_name, sep = " ", quote = FALSE, row.names = FALSE, col.names = FALSE)
#Create a utility to delete the data that has been added, just so that it is easier to tidy up whilst testing
write.table(paste("sudo /usr/share/opentsdb/bin/tsdb scan 1970/01/01-00:00:00 sum", metric, "--delete") , file = "tsdb/delete_data.sh", append = TRUE, row.names = FALSE, col.names = FALSE, quote = FALSE)
system("chmod +x tsdb/delete_data.sh")
col_counter <- col_counter + 1
}
col_counter <- 1
row_counter <- row_counter + 1
}
}
```