The goal of this tutorial is to write a Python code downloading data from my weather station (https://fulmanski.pl/data/weather/home/) and then uploading these data to HBase.
- Prerequisites
- Data source characteristic
- Download data with Python
- Choose suitable HBase data organisation
- Upload data to HBase with Python
- Retrieving data from HBase with Python
- Summary
The prerequisite for this tutorial is to have HBase installed along with Python and the happybase Python library. I described this process in my tutorial Install and work with Apache HBase. I use PyCharm to simplifying code management for developing Python code -- you can do the same or choose any other preferred way which suits you.
You start with running Thrift service you will use to communicate with HBase:
1 2 |
nosql@nosql:~$ /home/nosql/Pulpit/nosql2/hbase-2.5.0/bin/hbase-daemon.sh start thrift running thrift, logging to /home/nosql/Pulpit/nosql2/hbase-2.5.0/bin/../logs/hbase-nosql-thrift-nosql.out |
and next HBase:
1 2 3 4 5 6 |
nosql@nosql:~$ /home/nosql/Pulpit/nosql2/hbase-2.5.0/bin/start-hbase.sh running master, logging to /home/nosql/Pulpit/nosql2/hbase-2.5.0/bin/../logs/hbase-nosql-master-nosql.out nosql@nosql:~$ jps 27715 ThriftServer 28195 HMaster 28282 Jps |
Start HBase shell to allow direct interaction with HBase:
1 |
nosql@nosql:~$ /home/nosql/Pulpit/nosql2/hbase-2.5.0/bin/hbase shell |
To get data you have to make a request of the following form:
1 |
https://fulmanski.pl/data/weather/home/get/{year}/{month}/{day} |
For example, if you type the following URL in your web browser:
1 |
https://fulmanski.pl/data/weather/home/get/2023/09/05 |
you will get the following one-row data:
1 |
{"result":"ok","description":"ok","forDate":"20230905","data":"{\"timestamp_server\": \"20230905000006\", \"data\": {\"timestamp\": \"20230904230021\", \"data\": [{\"type\": \"DS18B20\", \"id_rom\": \"281c0489060000bc\", \"temperature_c\": \"17.500000\", \"location\": \"outside\"}, {\"type\": \"DS3231\", \"temperature_c\": \"25.500000\"}, {\"type\": \"dht\", \"humidity\": \"36.700001\", \"temperature_c\": \"16.799999\", \"heat_index_c\": \"15.493677\"}, {\"type\": \"MPL3115A2\", \"preasure_hPa\": \"1012.397522\", \"temperature_c\": \"17.062500\"}, {\"type\": \"light_10kO\", \"ADC_VALUE\": \"1015\", \"ADC_VALUE_MAX\": \"1023\"}]}}\n{\"timestamp_server\": \"20230905000016[...]\n"} |
This is a typical JSON file organized as I described below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
{ "result":"ok", [Always should be "ok"] "description":"ok", [Always should be "ok"] "forDate":"20230905", [Date for which you get data] "data": [Data (see comment below)] " [Begin of all JSONs] { [Begin of the first JSON] \"timestamp_server\": \"20230905000006\", \"data\": { \"timestamp\": \"20230904230021\", \"data\": [ { \"type\": \"DS18B20\", [DS18B20 temperature sensor] \"id_rom\": \"281c0489060000bc\", [Sensor ID] \"temperature_c\": \"17.500000\", [Temperature in celsius degrees measured by sensor] \"location\": \"outside\" [Where it is located] }, { \"type\": \"DS3231\", [Internal temperature sensor of DS3231 RTC] \"temperature_c\": \"25.500000\" [Temperature in celsius degrees measured by sensor] }, { \"type\": \"dht\", [Integrated temperature and humidity sensor of DHT21 type] \"humidity\": \"36.700001\", [Humidity in percentages measured by sensor] \"temperature_c\": \"16.799999\", [Temperature in celsius degrees measured by sensor] \"heat_index_c\": \"15.493677\" [Perceived temperature in celsius degrees measured by sensor] }, { \"type\": \"MPL3115A2\", [Preasure sensor] \"preasure_hPa\": \"1012.397522\", [Air preasure in hPa] \"temperature_c\": \"17.062500\" [Temperature in celsius degrees measured by sensor] }, { \"type\": \"light_10kO\", [Light detector sensor ased on photoresistor] \"ADC_VALUE\": \"1015\", [Current value; 0 for full light, 1023 for full darkness] \"ADC_VALUE_MAX\": \"1023\" [Maximum value you should expect however sometimes you may get 1024] } ] } } [End of the first JSON] \n { [Begin of the second JSON] \"timestamp_server\": \"20230905000016 [...] \n " [End of all JSONs] } |
Comment about data
field: Value of this field is of text type. It consists of multiple JSONs separated by new line character \n
. Every single JSON contains data for a given moment in time and starts with timestamp_server
field (with value "20230905000006"
in the above data) following by data
field with the first filed describing weather station timestamp. Both timestamps may be different no more than few seconds except the situation when the time is changed to winter or summer time -- in our case you have "timestamp": "20230904230021"
. If you are curious, the server time is the correct one, because I don't correct time settings in my weather station when the time is shifted +/-1 hour the season changes.
Now you are ready to write Python code to download data and parse it. My code I present below is rather simply than sophisticated but I want to keep it understandable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
import json import urllib.request def getDataFromServer(year, month, day): if day < 10: day = f"0{day}" if month < 10: month = f"0{month}" url = f"https://fulmanski.pl/data/weather/home/get/{year}/{month}/{day}" dataURL = urllib.request.urlopen(url) data = dataURL.read() dataJSON = data.decode('utf-8') data = json.loads(dataJSON) if data["result"] == "ok": dataToReturn = [] for row in data["data"].split("\n"): if not row == "": rowJSON = json.loads(row) timeServer = rowJSON["timestamp_server"] time = rowJSON["data"]["timestamp"] temperature_DS18B20_outside = None temperature_DS3231 = None humidity_dht = None temperature_dht = None heat_index_dht = None preasure_MPL3115A2 = None temperature_MPL3115A2 = None adc_value_light_10kO = None for sensor in rowJSON["data"]["data"]: if sensor["type"] == "DS18B20" and sensor["id_rom"] == "281c0489060000bc": temperature_DS18B20_outside = sensor["temperature_c"] elif sensor["type"] == "DS3231": temperature_DS3231 = sensor["temperature_c"] elif sensor["type"] == "dht": humidity_dht = sensor["humidity"] temperature_dht = sensor["temperature_c"] heat_index_dht = sensor["heat_index_c"] elif sensor["type"] == "MPL3115A2": preasure_MPL3115A2 = sensor["preasure_hPa"] temperature_MPL3115A2 = sensor["temperature_c"] elif sensor["type"] == "light_10kO": adc_value_light_10kO = sensor["ADC_VALUE"] dataRow = {"timeServer": timeServer, "time": time, "temperature_DS18B20_outside": temperature_DS18B20_outside, "temperature_DS3231": temperature_DS3231, "humidity_dht": humidity_dht, "temperature_dht": temperature_dht, "heat_index_dht": heat_index_dht, "preasure_MPL3115A2": preasure_MPL3115A2, "temperature_MPL3115A2": temperature_MPL3115A2, "adc_value_light_10kO": adc_value_light_10kO} dataToReturn.append(dataRow) return dataToReturn if __name__ == '__main__': data = getDataFromServer(2023, 9, 5) for d in data: print(f'{d["timeServer"]} \ {d["time"]} \ {d["temperature_DS18B20_outside"]} \ {d["temperature_DS3231"]} \ {d["humidity_dht"]} \ {d["temperature_dht"]} \ {d["heat_index_dht"]} \ {d["preasure_MPL3115A2"]} \ {d["temperature_MPL3115A2"]} \ {d["adc_value_light_10kO"]}') |
For simplicity I decided to paste this code into main.py
file of a new PyCharm Python project. You can run it directly from PyCharm or alternatively from command line as:
1 |
nosql@nosql:~$ /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/get_weather_data_from_server/venv/bin/python3 /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/get_weather_data_from_server/main.py |
where /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/get_weather_data_from_server/
is the location of the project.
Regardless of the method result was the same, with the total 8264 lines, first line:
1 |
20230905000006 20230904230021 17.500000 25.500000 36.700001 16.799999 15.493677 1012.397522 17.062500 1015 |
and the last:
1 |
20230905235958 20230905230013 17.437500 25.750000 35.500000 17.000000 15.682342 1008.822510 16.937500 1013 |
Now is the most important part when you work with NoSQL databases: you have to decide how you want to aggregate (organize) your data.
For example, you can group your data (for one day) by hour. This way date like 20230905
would be a row key and hours from 0 to 23 would be a keys of column families. Then you can put as many columns in each family as you want -- for example you can create their keys based on a minutes part from timestamp: for the last row this would be 59
. Now if you provide the key:
1 |
[20230905][23][10] |
you will get all data collected at 23:10 of the September 5th in 2023 regardless of the seconds.
Of course there are many different ways you may aggregate your data, for example by sensor type or parameters like temperature or humidity -- it is strongly dependent on your future needs.
Whatever you decide, you have to prepare database. Login into HBase shell and do:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
hbase:001:0> list TABLE test:table1 1 row(s) Took 0.7455 seconds => ["test:table1"] hbase:002:0> create 'weatherstation', '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23' Created table weatherstation Took 1.0767 seconds => Hbase::Table - weatherstation hbase:003:0> list TABLE weatherstation test:table1 2 row(s) Took 0.0117 seconds => ["weatherstation", "test:table1"] |
To complete my way of aggregation you have to add new function to your code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
def grupData(data, by=None): if by is None: return None elif by == "HOUR": dataToReturn = {} l = len(data) c = 0 for d in data: year = d["timeServer"][:4] month = d["timeServer"][4:6] day = d["timeServer"][6:8] hour = d["timeServer"][8:10] minutes = d["timeServer"][10:12] seconds = d["timeServer"][12:14] c += 1 m = (int)(l / 10) if (c % m) == 0: print(f"{c} of {l} ({(int)(100 * c / l)}%)") if hour not in dataToReturn: dataToReturn[hour] = [] dataToReturn[hour].append(d) return dataToReturn |
Next you have to change "main
" part:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
if __name__ == '__main__': year = 2023 month = 9 day = 5 data = getDataFromServer(year, month, day) if day < 10: day = f"0{day}" if month < 10: month = f"0{month}" dataByHour = grupData(data, by="HOUR") sizeTotal = 0 for familyKey in dataByHour.keys(): size = len(dataByHour[familyKey]) sizeTotal += size print(f"Number of data for key={year}{month}{day} and family={familyKey}: {size}") print(f"Total size: {sizeTotal} of {len(data)}") |
When you run it, you will see:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
nosql@nosql:~$ /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/get_weather_data_from_server/venv/bin/python3 /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/get_weather_data_from_server/main.py 826 of 8264 (9%) 1652 of 8264 (19%) 2478 of 8264 (29%) 3304 of 8264 (39%) 4130 of 8264 (49%) 4956 of 8264 (59%) 5782 of 8264 (69%) 6608 of 8264 (79%) 7434 of 8264 (89%) 8260 of 8264 (99%) Number of data for key=20230905 and family=00: 359 Number of data for key=20230905 and family=01: 349 Number of data for key=20230905 and family=02: 338 Number of data for key=20230905 and family=03: 334 Number of data for key=20230905 and family=04: 332 Number of data for key=20230905 and family=05: 324 Number of data for key=20230905 and family=06: 332 Number of data for key=20230905 and family=07: 345 Number of data for key=20230905 and family=08: 332 Number of data for key=20230905 and family=09: 308 Number of data for key=20230905 and family=10: 354 Number of data for key=20230905 and family=11: 353 Number of data for key=20230905 and family=12: 354 Number of data for key=20230905 and family=13: 351 Number of data for key=20230905 and family=14: 350 Number of data for key=20230905 and family=15: 345 Number of data for key=20230905 and family=16: 360 Number of data for key=20230905 and family=17: 360 Number of data for key=20230905 and family=18: 355 Number of data for key=20230905 and family=19: 356 Number of data for key=20230905 and family=20: 350 Number of data for key=20230905 and family=21: 342 Number of data for key=20230905 and family=22: 343 Number of data for key=20230905 and family=23: 338 Total size: 8264 of 8264 |
Having the above you are ready to write code uploading data to HBase. Create a new Python project -- in my case it's again a PyCharm project with the name upload_weather_data_to_hbase
.
Next copy and paste all previous code and change "main
" part:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import json import urllib.request getDataFromServer(year, month, day): [...] grupData(data, by=None): [...] if __name__ == '__main__': year = 2023 month = 9 day = 5 uploadData(year, month, day) |
Now you will implement uploading procedure uploadData(year, month, day)
. Start with some testing code to be sure that your data are ready to be uploaded:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
def uploadData(year, month, day): toUpload = {} data = getDataFromServer(year, month, day) dataByHour = grupData(data, by="HOUR") sizeTotal = 0 for familyKey in dataByHour.keys(): # Split data by minutes toUpload[familyKey] = {} for d in dataByHour[familyKey]: minutes = d["timeServer"][10:12] seconds = d["timeServer"][12:14] if minutes not in toUpload[familyKey]: toUpload[familyKey][minutes] = {} toUpload[familyKey][minutes][seconds] = d break for familyKey in toUpload.keys(): print(familyKey) # hour for columnKey in toUpload[familyKey].keys(): print(f" {columnKey}") # minutes print(f" {toUpload[familyKey][columnKey]}") |
The above code prints data for the first familyKey
. You can test it directly in PyCharm or in terminal:
1 |
nosql@nosql:~$ /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/venv/bin/python3 /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/main.py > res.txt |
The final part of res.txt
file contains:
1 2 |
59 {'06': {'timeServer': '20230905005906', 'time': '20230904235921', 'temperature_DS18B20_outside': '17.187500', 'temperature_DS3231': '25.500000', 'humidity_dht': '37.000000', 'temperature_dht': '16.500000', 'heat_index_dht': '15.171516', 'preasure_MPL3115A2': '1012.187500', 'temperature_MPL3115A2': '16.687500', 'adc_value_light_10kO': '1015'}, '16': {'timeServer': '20230905005916', 'time': '20230904235931', 'temperature_DS18B20_outside': '17.187500', 'temperature_DS3231': '25.500000', 'humidity_dht': '37.299999', 'temperature_dht': '17.200001', 'heat_index_dht': '15.949342', 'preasure_MPL3115A2': '1012.179993', 'temperature_MPL3115A2': '16.687500', 'adc_value_light_10kO': '1011'}, '26': {'timeServer': '20230905005926', 'time': '20230904235941', 'temperature_DS18B20_outside': '17.187500', 'temperature_DS3231': '25.500000', 'humidity_dht': '37.000000', 'temperature_dht': '16.500000', 'heat_index_dht': '15.171516', 'preasure_MPL3115A2': '1012.167480', 'temperature_MPL3115A2': '16.687500', 'adc_value_light_10kO': '1008'}, '36': {'timeServer': '20230905005936', 'time': '20230904235951', 'temperature_DS18B20_outside': '17.187500', 'temperature_DS3231': '25.500000', 'humidity_dht': '37.000000', 'temperature_dht': '16.500000', 'heat_index_dht': '15.171516', 'preasure_MPL3115A2': '1012.174988', 'temperature_MPL3115A2': '16.687500', 'adc_value_light_10kO': '1009'}, '46': {'timeServer': '20230905005946', 'time': '20230905000001', 'temperature_DS18B20_outside': '17.187500', 'temperature_DS3231': '25.500000', 'humidity_dht': '37.200001', 'temperature_dht': '17.100000', 'heat_index_dht': '15.836729', 'preasure_MPL3115A2': '1012.219971', 'temperature_MPL3115A2': '16.687500', 'adc_value_light_10kO': '1024'}, '56': {'timeServer': '20230905005956', 'time': '20230905000011', 'temperature_DS18B20_outside': '17.187500', 'temperature_DS3231': '25.500000', 'humidity_dht': '37.200001', 'temperature_dht': '16.900000', 'heat_index_dht': '15.616732', 'preasure_MPL3115A2': '1012.187500', 'temperature_MPL3115A2': '16.687500', 'adc_value_light_10kO': '1015'}} |
As you can see, you have all seconds for the 59. minute of the 00 hour of the September 5th, 2023.
Now you will change the code to push this data to HBase:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
def uploadData(year, month, day): toUpload = {} data = getDataFromServer(year, month, day) dataByHour = grupData(data, by="HOUR") if day < 10: day = f"0{day}" if month < 10: month = f"0{month}" rowKey = f"{year}{month}{day}" sizeTotal = 0 for familyKey in dataByHour.keys(): # Split data by minutes toUpload[familyKey] = {} for d in dataByHour[familyKey]: minutes = d["timeServer"][10:12] seconds = d["timeServer"][12:14] if minutes not in toUpload[familyKey]: toUpload[familyKey][minutes] = {} toUpload[familyKey][minutes][seconds] = d break connection = happybase.Connection(host='127.0.0.1' , port=9090 , autoconnect=True # ,table_prefix=namespace # table_prefix_separator=":" ) connection.open() table = connection.table("weatherstation") # Write the row to the batch. When the batch size is reached, # rows will be sent to the database. batch = table.batch(batch_size=1000) for familyKey in toUpload.keys(): print(familyKey) # hour print(f" ", end="") # minutes for columnKey in toUpload[familyKey].keys(): print(f"{columnKey}, ", end="") # minutes #print(f" {toUpload[familyKey][columnKey]}") jsonString = json.dumps(toUpload[familyKey][columnKey]) jsonStringBytes = jsonString.encode('utf-8') batch.put(rowKey.encode("utf-8"), {f"{familyKey}:{columnKey}".encode("utf-8"): jsonStringBytes}) # Alternativelly you can: #table.put(row_key, {column_name.encode("utf-8"): value.encode("utf-8")}) print("") # If there are any rows left in the batch, send them now. batch.send() connection.close() |
When executed you will see:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
nosql@nosql:~$ /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/venv/bin/python3 /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/main.py 826 of 8264 (9%) 1652 of 8264 (19%) 2478 of 8264 (29%) 3304 of 8264 (39%) 4130 of 8264 (49%) 4956 of 8264 (59%) 5782 of 8264 (69%) 6608 of 8264 (79%) 7434 of 8264 (89%) 8260 of 8264 (99%) 00 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, |
Check HBase contents -- type scan 'weatherstation'
in HBase shell:
1 |
hbase:006:0> scan 'weatherstation' |
You will see a lot of text flooding your screen, but take care about final part. In my case it is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
[...] 20230905 column=00:58, timestamp=2023-09-08T17:55:35.680, value={"06": {"timeServer": "20 230905005806", "time": "20230904235821", "temperature_DS18B20_outside": "17.1875 00", "temperature_DS3231": "25.500000", "humidity_dht": "37.299999", "temperatur e_dht": "17.000000", "heat_index_dht": "15.729342", "preasure_MPL3115A2": "1012. 174988", "temperature_MPL3115A2": "16.687500", "adc_value_light_10kO": "1011"}, "16": {"timeServer": "20230905005816", "time": "20230904235831", "temperature_DS 18B20_outside": "17.187500", "temperature_DS3231": "25.500000", "humidity_dht": "37.099998", "temperature_dht": "16.600000", "heat_index_dht": "15.284124", "pre asure_MPL3115A2": "1012.187500", "temperature_MPL3115A2": "16.687500", "adc_valu e_light_10kO": "1011"}, "26": {"timeServer": "20230905005826", "time": "20230904 235841", "temperature_DS18B20_outside": "17.187500", "temperature_DS3231": "25.5 00000", "humidity_dht": "37.000000", "temperature_dht": "16.600000", "heat_index _dht": "15.281514", "preasure_MPL3115A2": "1012.195007", "temperature_MPL3115A2" : "16.687500", "adc_value_light_10kO": "1009"}, "36": {"timeServer": "2023090500 5836", "time": "20230904235851", "temperature_DS18B20_outside": "17.187500", "te mperature_DS3231": "25.500000", "humidity_dht": "37.200001", "temperature_dht": "16.799999", "heat_index_dht": "15.506734", "preasure_MPL3115A2": "1012.217529", "temperature_MPL3115A2": "16.687500", "adc_value_light_10kO": "1024"}, "46": {" timeServer": "20230905005846", "time": "20230904235901", "temperature_DS18B20_ou tside": "17.187500", "temperature_DS3231": "25.500000", "humidity_dht": "37.9000 02", "temperature_dht": "18.400000", "heat_index_dht": "17.284996", "preasure_MP L3115A2": "1012.219971", "temperature_MPL3115A2": "16.687500", "adc_value_light_ 10kO": "1014"}, "56": {"timeServer": "20230905005856", "time": "20230904235911", "temperature_DS18B20_outside": "17.187500", "temperature_DS3231": "25.500000", "humidity_dht": "37.500000", "temperature_dht": "17.500000", "heat_index_dht": " 16.284559", "preasure_MPL3115A2": "1012.169983", "temperature_MPL3115A2": "16.68 7500", "adc_value_light_10kO": "1024"}} 20230905 column=00:59, timestamp=2023-09-08T17:55:35.680, value={"06": {"timeServer": "20 230905005906", "time": "20230904235921", "temperature_DS18B20_outside": "17.1875 00", "temperature_DS3231": "25.500000", "humidity_dht": "37.000000", "temperatur e_dht": "16.500000", "heat_index_dht": "15.171516", "preasure_MPL3115A2": "1012. 187500", "temperature_MPL3115A2": "16.687500", "adc_value_light_10kO": "1015"}, "16": {"timeServer": "20230905005916", "time": "20230904235931", "temperature_DS 18B20_outside": "17.187500", "temperature_DS3231": "25.500000", "humidity_dht": "37.299999", "temperature_dht": "17.200001", "heat_index_dht": "15.949342", "pre asure_MPL3115A2": "1012.179993", "temperature_MPL3115A2": "16.687500", "adc_valu e_light_10kO": "1011"}, "26": {"timeServer": "20230905005926", "time": "20230904 235941", "temperature_DS18B20_outside": "17.187500", "temperature_DS3231": "25.5 00000", "humidity_dht": "37.000000", "temperature_dht": "16.500000", "heat_index _dht": "15.171516", "preasure_MPL3115A2": "1012.167480", "temperature_MPL3115A2" : "16.687500", "adc_value_light_10kO": "1008"}, "36": {"timeServer": "2023090500 5936", "time": "20230904235951", "temperature_DS18B20_outside": "17.187500", "te mperature_DS3231": "25.500000", "humidity_dht": "37.000000", "temperature_dht": "16.500000", "heat_index_dht": "15.171516", "preasure_MPL3115A2": "1012.174988", "temperature_MPL3115A2": "16.687500", "adc_value_light_10kO": "1009"}, "46": {" timeServer": "20230905005946", "time": "20230905000001", "temperature_DS18B20_ou tside": "17.187500", "temperature_DS3231": "25.500000", "humidity_dht": "37.2000 01", "temperature_dht": "17.100000", "heat_index_dht": "15.836729", "preasure_MP L3115A2": "1012.219971", "temperature_MPL3115A2": "16.687500", "adc_value_light_ 10kO": "1024"}, "56": {"timeServer": "20230905005956", "time": "20230905000011", "temperature_DS18B20_outside": "17.187500", "temperature_DS3231": "25.500000", "humidity_dht": "37.200001", "temperature_dht": "16.900000", "heat_index_dht": " 15.616732", "preasure_MPL3115A2": "1012.187500", "temperature_MPL3115A2": "16.68 7500", "adc_value_light_10kO": "1015"}} 1 row(s) Took 2.1347 seconds |
Can you see the text 1 row(s)
in the penultimate row of the output? Yes, all your data (according to your code) were placed in one row. The key of this row is 20230905
. My last column is column=00:59
and it is preceded by column=00:58
.
To avoid flooding you may simply get information about the total number of rows in your table:
1 2 3 4 |
hbase:006:0> count 'weatherstation' 1 row(s) Took 0.8200 seconds => 1 |
Now you can upload all data for September 5th, 2023: comment break
statement in uploadData(year, month, day)
function, run your python code and check in shell the result:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
nosql@nosql:~$ /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/venv/bin/python3 /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/main.py 826 of 8264 (9%) 1652 of 8264 (19%) 2478 of 8264 (29%) 3304 of 8264 (39%) 4130 of 8264 (49%) 4956 of 8264 (59%) 5782 of 8264 (69%) 6608 of 8264 (79%) 7434 of 8264 (89%) 8260 of 8264 (99%) 00 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 01 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 02 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 03 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 04 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 05 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 06 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 07 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 08 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 09 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 10 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 11 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 12 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 13 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 14 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 15 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 16 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 17 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 18 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 19 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 20 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 21 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 22 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 23 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, |
Still you have one row:
1 2 3 4 |
hbase:007:0> count 'weatherstation' 1 row(s) Took 0.0326 seconds => 1 |
Check if other families are not empty. For example try to get data from row 20230905
and column family 23
:
1 |
hbase:010:0> get 'weatherstation', '20230905', '23' |
Again a lot of data and final part in my case looks like below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
[...] 23:58 timestamp=2023-09-08T18:40:17.855, value={"08": {"timeServer": "20230905235808", "time": "20230905225823", "temperature_DS18B20_outside": "17.500000", "temperat ure_DS3231": "25.750000", "humidity_dht": "34.200001", "temperature_dht": "17.20 0001", "heat_index_dht": "15.868399", "preasure_MPL3115A2": "1008.835022", "temp erature_MPL3115A2": "17.000000", "adc_value_light_10kO": "1017"}, "18": {"timeSe rver": "20230905235818", "time": "20230905225833", "temperature_DS18B20_outside" : "17.500000", "temperature_DS3231": "25.750000", "humidity_dht": "36.200001", " temperature_dht": "18.700001", "heat_index_dht": "17.570604", "preasure_MPL3115A 2": "1008.857483", "temperature_MPL3115A2": "17.000000", "adc_value_light_10kO": "1011"}, "28": {"timeServer": "20230905235828", "time": "20230905225843", "temp erature_DS18B20_outside": "17.437500", "temperature_DS3231": "25.750000", "humid ity_dht": "35.799999", "temperature_dht": "17.900000", "heat_index_dht": "16.680 166", "preasure_MPL3115A2": "1008.820007", "temperature_MPL3115A2": "17.000000", "adc_value_light_10kO": "1008"}, "38": {"timeServer": "20230905235838", "time": "20230905225853", "temperature_DS18B20_outside": "17.437500", "temperature_DS32 31": "25.750000", "humidity_dht": "35.099998", "temperature_dht": "16.299999", " heat_index_dht": "14.901907", "preasure_MPL3115A2": "1008.867493", "temperature_ MPL3115A2": "17.000000", "adc_value_light_10kO": "1009"}, "48": {"timeServer": " 20230905235848", "time": "20230905225903", "temperature_DS18B20_outside": "17.43 7500", "temperature_DS3231": "25.750000", "humidity_dht": "35.400002", "temperat ure_dht": "17.200001", "heat_index_dht": "15.899733", "preasure_MPL3115A2": "100 8.849976", "temperature_MPL3115A2": "17.000000", "adc_value_light_10kO": "1006"} , "58": {"timeServer": "20230905235858", "time": "20230905225913", "temperature_ DS18B20_outside": "17.500000", "temperature_DS3231": "25.750000", "humidity_dht" : "35.599998", "temperature_dht": "17.600000", "heat_index_dht": "16.344948", "p reasure_MPL3115A2": "1008.840027", "temperature_MPL3115A2": "17.000000", "adc_va lue_light_10kO": "1024"}} 23:59 timestamp=2023-09-08T18:40:17.855, value={"08": {"timeServer": "20230905235908", "time": "20230905225923", "temperature_DS18B20_outside": "17.500000", "temperat ure_DS3231": "25.750000", "humidity_dht": "35.400002", "temperature_dht": "17.20 0001", "heat_index_dht": "15.899733", "preasure_MPL3115A2": "1008.844971", "temp erature_MPL3115A2": "17.000000", "adc_value_light_10kO": "1010"}, "18": {"timeSe rver": "20230905235918", "time": "20230905225933", "temperature_DS18B20_outside" : "17.437500", "temperature_DS3231": "25.750000", "humidity_dht": "35.700001", " temperature_dht": "17.799999", "heat_index_dht": "16.567556", "preasure_MPL3115A 2": "1008.859985", "temperature_MPL3115A2": "17.000000", "adc_value_light_10kO": "1006"}, "48": {"timeServer": "20230905235948", "time": "20230905230003", "temp erature_DS18B20_outside": "17.437500", "temperature_DS3231": "25.750000", "humid ity_dht": "34.400002", "temperature_dht": "17.400000", "heat_index_dht": "16.093 616", "preasure_MPL3115A2": "1008.847473", "temperature_MPL3115A2": "17.000000", "adc_value_light_10kO": "1021"}, "58": {"timeServer": "20230905235958", "time": "20230905230013", "temperature_DS18B20_outside": "17.437500", "temperature_DS32 31": "25.750000", "humidity_dht": "35.500000", "temperature_dht": "17.000000", " heat_index_dht": "15.682342", "preasure_MPL3115A2": "1008.822510", "temperature_ MPL3115A2": "16.937500", "adc_value_light_10kO": "1013"}} 1 row(s) Took 0.7314 seconds |
As you can see, there is no column=
phrase in output because column indicator preceded by family name is printed on the left margin -- notice for example 23:59
indicating family 23
and column 59
.
Great, all your data are now in HBase.
In the final part of this tutorial you will update the code to work in the following way:
- When you execute it from command line, you have to provide year, month and date.
- Script should check if data for specified date exists in HBase.
- If data exist, you should get them directly from HBase and print some info (for example outer temperature averaged by hour).
- If data are not present in database then first you should upload it and then make action from previous step.
To complete this, you have to keep the functions you wrote previously (getDataFromServer(year, month, day)
, grupData(data, by=None)
and uploadData(year, month, day)
) unchanged, add new functions (getData(year, month, day)
, printHBaseData(hbaseData)
and main()
) as well as change final call of main module:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
import happybase import json import sys import urllib.request def getDataFromServer(year, month, day): [...] def grupData(data, by=None): [...] def uploadData(year, month, day): [...] def getData(year, month, day): if day < 10: day = f"0{day}" if month < 10: month = f"0{month}" rowKey = f"{year}{month}{day}" connection = happybase.Connection(host='127.0.0.1' , port=9090 , autoconnect=True # ,table_prefix=namespace # table_prefix_separator=":" ) connection.open() table = connection.table("weatherstation") hbaseData = {} for family in range(24): if family < 10: family = f"0{family}" else: family = f"{family}" rowFamily = table.row(rowKey.encode("utf-8"), columns=[family.encode("utf-8")]) if rowFamily: hbaseData[family] = rowFamily # Alternatively you can: # rows = table.rows([b'row-key-1', b'row-key-2']) # row = table.row(b'row-key', columns=[b'cf1:col1', b'cf1:col2']) connection.close() return hbaseData def printHBaseData(hbaseData): for hour in hbaseData: parameterValueSum = 0 parameterCount = 0 for minutes in hbaseData[hour]: m = minutes.decode('utf-8')[3:] jsonBytes = hbaseData[hour][minutes] jsonString = jsonBytes.decode('utf-8') dataJSON = json.loads(jsonString) for seconds in dataJSON: temperature_DS18B20_outside = dataJSON[seconds]["temperature_DS18B20_outside"] parameterValueSum += float(temperature_DS18B20_outside) parameterCount += 1 parameterValue = int(parameterValueSum / parameterCount * 100) parameterValue /= 100.0 print(f"{hour} -> {parameterValue}") def main(): numberOfArgs = len(sys.argv) - 1 if numberOfArgs == 3: year = int(sys.argv[1]) month = int(sys.argv[2]) day = int(sys.argv[3]) hbaseData = getData(year, month, day) if not hbaseData: print("Requested data don't exist in database. Try to download them...") uploadData(year, month, day) hbaseData = getData(year, month, day) printHBaseData(hbaseData) elif numberOfArgs == 4: year = int(sys.argv[1]) month = int(sys.argv[2]) day = int(sys.argv[3]) force = sys.argv[4] if force == "force": uploadData(year, month, day) hbaseData = getData(year, month, day) printHBaseData(hbaseData) else: print("Last argument should be either 'force' or empty") else: print("call: PROGNAME YEAR MONTH DAY [force]") if __name__ == '__main__': main() |
When done you can run it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
nosql@nosql:~$ /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/venv/bin/python3 /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/main.py 2023 09 05 00 -> 17.34 01 -> 16.94 02 -> 16.59 03 -> 16.24 04 -> 15.96 05 -> 15.73 06 -> 15.72 07 -> 22.25 08 -> 28.71 09 -> 31.76 10 -> 32.89 11 -> 33.11 12 -> 31.19 13 -> 30.66 14 -> 30.4 15 -> 29.83 16 -> 28.85 17 -> 26.35 18 -> 24.42 19 -> 21.92 20 -> 19.92 21 -> 18.78 22 -> 18.12 23 -> 17.65 |
And finally test it on new date:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
nosql@nosql:~$ /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/venv/bin/python3 /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/main.py 2023 09 03 Requested data don't exist in database. Try to download them... 400 of 4007 (9%) 800 of 4007 (19%) 1200 of 4007 (29%) 1600 of 4007 (39%) 2000 of 4007 (49%) 2400 of 4007 (59%) 2800 of 4007 (69%) 3200 of 4007 (79%) 3600 of 4007 (89%) 4000 of 4007 (99%) 00 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 35, 37, 38, 39, 40, 41, 42, 43, 44, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 01 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 18, 19, 21, 22, 23, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 41, 42, 43, 44, 45, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 02 01, 02, 03, 04, 05, 06, 07, 08, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 03 01, 02, 03, 04, 05, 06, 08, 09, 10, 11, 12, 14, 15, 17, 18, 19, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 36, 38, 39, 40, 41, 42, 43, 44, 45, 47, 48, 49, 51, 52, 53, 54, 55, 57, 58, 59, 04 00, 01, 02, 03, 04, 05, 06, 08, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 28, 29, 30, 31, 33, 34, 35, 36, 37, 39, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 56, 57, 58, 59, 05 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 38, 40, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55, 57, 58, 59, 06 03, 04, 07, 08, 10, 11, 12, 13, 14, 18, 19, 20, 21, 22, 23, 24, 25, 29, 30, 31, 34, 35, 36, 37, 38, 40, 41, 43, 44, 45, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 07 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56, 57, 58, 59, 08 00, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 57, 59, 09 02, 03, 05, 06, 07, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 10 00, 01, 02, 03, 04, 05, 07, 10, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 24, 25, 27, 28, 29, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56, 57, 58, 59, 11 01, 02, 03, 04, 05, 06, 07, 08, 09, 11, 12, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 59, 12 00, 01, 02, 03, 04, 05, 07, 08, 09, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 56, 57, 58, 59, 13 00, 01, 02, 03, 04, 05, 06, 08, 09, 10, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 14 00, 01, 02, 03, 04, 05, 06, 07, 08, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 15 00, 02, 03, 04, 05, 08, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 34, 37, 39, 42, 43, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 58, 16 00, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41, 43, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 17 00, 01, 02, 03, 04, 05, 06, 07, 08, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 30, 31, 32, 33, 34, 36, 41, 42, 44, 47, 48, 50, 51, 52, 54, 55, 56, 57, 18 00, 01, 02, 03, 04, 05, 07, 08, 09, 10, 11, 12, 13, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 42, 43, 44, 45, 47, 48, 50, 51, 52, 53, 54, 55, 57, 58, 59, 19 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 49, 50, 51, 52, 53, 54, 55, 56, 58, 59, 20 00, 01, 02, 03, 04, 05, 06, 08, 09, 10, 11, 12, 13, 14, 15, 21, 22, 23, 24, 26, 27, 28, 29, 30, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 58, 59, 21 00, 01, 04, 05, 06, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 22 01, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 31, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 56, 57, 58, 59, 23 00, 02, 03, 06, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 00 -> 18.4 01 -> 17.9 02 -> 17.46 03 -> 17.13 04 -> 17.23 05 -> 17.41 06 -> 17.36 07 -> 19.54 08 -> 21.69 09 -> 24.45 10 -> 29.78 11 -> 30.24 12 -> 27.66 13 -> 27.76 14 -> 27.51 15 -> 27.54 16 -> 26.91 17 -> 24.98 18 -> 23.36 19 -> 21.94 20 -> 21.09 21 -> 20.37 22 -> 19.76 23 -> 19.24 |
If you re-run it for the same parameters again uploading step is skipped:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
nosql@nosql:~$ /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/venv/bin/python3 /home/nosql/Pulpit/hbase_weather_data_task/pycharm_projects/upload_weather_data_to_hbase/main.py 2023 09 03 00 -> 18.4 01 -> 17.9 02 -> 17.46 03 -> 17.13 04 -> 17.23 05 -> 17.41 06 -> 17.36 07 -> 19.54 08 -> 21.69 09 -> 24.45 10 -> 29.78 11 -> 30.24 12 -> 27.66 13 -> 27.76 14 -> 27.51 15 -> 27.54 16 -> 26.91 17 -> 24.98 18 -> 23.36 19 -> 21.94 20 -> 21.09 21 -> 20.37 22 -> 19.76 23 -> 19.24 |
You can confirm presence of data in HBase (there should be two rows):
1 2 3 4 |
hbase:010:0> count 'weatherstation' 2 row(s) Took 0.2476 seconds => 2 |
You can also print all row keys:
1 2 3 4 5 6 |
hbase:011:0> count 'weatherstation', INTERVAL => 1 Current count: 1, row: 20230903 Current count: 2, row: 20230905 2 row(s) Took 0.0352 seconds => 2 |
As you can notice, keys are always sorted (data are sorted according to keys).
At this point you have fully functional Python script for conditional HBase data uploading. Conditional because only non-present data are downloaded from weather server and then uploaded to HBase. Anyway you have an option to repeat this proces also for present data specyfying force
parameter in command line call.
As always there is a lot of places you can improve in existing code or even write a new one, for example to turn it into web application.