{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "(chpt-xml)=\n", "# Structured Data (XML, xlsx & JSON)\n", "\n", "Create, manipulate, and copy semi-structured data files in the form of xlsx-workbooks and JSON files. For interactive reading and executing code blocks [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hydro-informatics/hydro-informatics.github.io/main?filepath=jupyter) and find *xml.ipynb*, or install {ref}`Python ` and {ref}`JupyterLab ` locally.\n", "\n", "This chapter starts with background information about XML and what XML has to do with workbooks, and JSON: XML is an abbreviation for [E**x**tensible **M**arkup **L**anguage](https://www.w3.org/TR/xml/) that defines rules for encoding documents. XML was designed for straightforward usage over the internet and we encounter XML documents all the time, on websites (e.g., *XHTML*), in the shape office documents (e.g., Office Open XML such as *docx*, *pptx*, or *xlsx*), or podcasts (e.g., *RSS*). The strength of the XML format is its characteristic of being both machine-readable (i.e., a computer can process XML files) and human-readable (i.e., we can read it like a newspaper). For instance, entering formulas in an xlsx workbook is simultaneously machine-readable and human-readable, since humans and computers can interpret and evaluate the formulas in this XML frame. Other file formats such as [*JSON* (JavaScript Object Notation)](https://www.json.org/json-en.html) resemble XML and Python can extract information from and export information to both formats. In water resources engineering and research, for example, we are mainly interested in the exchange of information with office workbooks (*xlsx* files), or with JSON files, which provide boundary conditions for numerical models.\n", "\n", "```{admonition} Watch this section as a video\n", ":class: tip, dropdown\n", "\n", "

Watch this section as a video on the @Hydro-Morphodynamics channel on YouTube.

\n", "```\n", "\n", "(sec-xlsx)=\n", "## Workbook (xlsx) Handling\n", "\n", "Why do we want to communicate with workbooks at all? We have already seen that Python is much more powerful than office programs for the systematic analysis of data. However, Python requires the abstraction of data in our minds to visualize, for example, the structure of a nested list. For this reason, data from and for marketing, your boss, or public authorities are often required to have visually easy-to-use workbook formats, which can be overlooked quickly. Still, we want to leverage the content of such workbook information efficiently with Python and we want to produce visually simplistic output that anyone can read without any Python knowledge.\n", "\n", "We have already seen that *pandas* provides easy routines for importing and exporting data from and to workbooks, respectively (cf. {ref}`file reading and writing with pandas `) with *pandas*). *pandas* primarily uses [openpyxl](https://openpyxl.readthedocs.io/en/stable/) depending on what is available in the active Python environment. *openpyxl* is one of the most powerful options for handling workbooks with Python (note: this assertion is subjective) and this section introduces *openpyxl*.\n", "\n", "```{tip}\n", "*flusstools* ships with *openpyxl*.\n", "```\n", "\n", "This introduction uses the following workbook-related terms:\n", "\n", "* **workbook** is the main *xlsx* file we work with (also called *spreadsheet*);\n", "* **sheet** is the tabular content of a workbook and one workbook can have multiple sheets;\n", "* **column**s are vertical lines in a sheet;\n", "* **row**s are horizontal lines in a sheet;\n", "* **cell**s are elements of a sheet.\n", "\n", "### Create a Workbook\n", "\n", "*openpyxl* has a `Workbook` class that enables to create and fill workbooks with data. Typically, an instance of the `Workbook` class is called `wb`, and worksheet variables are called `ws`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Workbook data in cell A2: 0.3678794411714422\n", "Corresponds to np.array value: 0.3678794411714422\n" ] } ], "source": [ "import openpyxl as oxl\n", "wb = oxl.Workbook() # create a Workbook instance\n", "ws = wb.active # activate worksheet\n", "ws.titel = \"Gaussian 2D\" # name worksheet\n", "ws[\"A1\"] = \"Gaussian sample data\" # write to cell A1\n", "\n", "# generate some data\n", "x, y = np.meshgrid(np.linspace(-1, 1, 20), np.linspace(-1, 1, 20))\n", "dis = np.sqrt(x * x + y * y)\n", "sigma, mu = 1.0, 0.0\n", "gaussian = np.exp(-((dis - mu) ** 2 / (2.0 * sigma ** 2)))\n", "\n", "# write data to worksheet\n", "m, n = gaussian.shape\n", "for i in range(1, m):\n", " for j in range(1, n):\n", " _ = ws.cell(row=i+1, column=j, value=gaussian[i-1, j-1])\n", "\n", "print(\"Workbook data in cell A2: \"+ str(ws[\"A2\"].value))\n", "print(\"Corresponds to np.array value: \" + str(gaussian[0, 0]))\n", "\n", "# save and close (destruct object) workbook\n", "wb.save(filename=\"data/python_workbook.xlsx\")\n", "wb.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{figure} ../img/py-xlsx.png\n", ":alt: python excel file creation\n", ":name: py-xlsx\n", "\n", "The resulting workbook.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read and Manipulate an Existing Workbook\n", "\n", "```{warning}\n", "When *openpyxl* opens an existing workbook, it cannot read graphical objects (e.g., graphs, shapes, or images). For this reason, saving the workbook with the same name will make graphical objects disappear.\n", "```\n", "\n", "*openpyxl* reads existing workbooks with `openpyxl.load_workbook(filename=str())`. This function accepts optional keyword arguments, such as:\n", "\n", "* `read_only=BOOLEAN` decides whether a workbook is opened in *read-only* mode. A workbook can only be manipulated if `read_only=False` (this is the default option, which can be useful to handle large files or to ensure that graphical objects are not lost).\n", "* `write_only=BOOLEAN` decides whether a workbook is opened in *write only* mode. If `write_only=True` (the default is `False`), no data can be read from a workbook but writing data is significantly faster (i.e., this option is useful for writing large datasets).\n", "* `data_only=BOOLEAN` determines if cell formulae or cell data will be read. For example, when a workbook cell's content is `=PI()`, `data_only=False` (this is the default option) reads the cell value as `=PI()` and `data_only=True` reads the cell value as `3.14159265359`.\n", "* `keep_vba=BOOLEAN` controls whether *Visual Basic* elements (macros) are kept or not. The default is `keep_vba=False` (i.e., no preservation), and `keep_vba=True` will still not enable a modification of Visual Basic elements.\n", "\n", "If `read_only=False`, we can manipulate cell values and also cell formats, including data formats (e.g., date, time, and [many more](https://openpyxl.readthedocs.io/en/stable/_modules/openpyxl/styles/numbers.html)), [font properties (and many more cell styles)](https://openpyxl.readthedocs.io/en/stable/styles.html), or colors in *HEX Color Code* ([find your favorite color here](https://www.colorcodehex.com/)). The following example opens the above-created `python_workbook.xlsx`, adds a new worksheet, illustrates the implementation of cell styles, and fills the workbook with random discharge measurements." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import datetime\n", "from openpyxl.styles import Font, Alignment, PatternFill\n", "wb = oxl.load_workbook(filename=\"data/python_workbook.xlsx\", read_only=False)\n", "ws = wb.create_sheet(title=\"Discharge\")\n", "\n", "# define title styles\n", "title_font = Font(name=\"Tahoma\", size=\"11\", bold=True, italic=True, color=\"C1D0DE\")\n", "title_fill = PatternFill(fill_type=\"solid\", start_color=\"050505\", end_color=\"073AD4\")\n", "title_align = Alignment(horizontal='center', vertical='bottom', text_rotation=0,\n", " wrap_text=False, shrink_to_fit=False, indent=0)\n", "\n", "date_time_format = \"yyyy-mm-dd hh:mm:ss\"\n", "ws[\"A1\"] = \"Date-Time (%s)\" % date_time_format\n", "\n", "title_cell_flow = ws[\"B1\"]\n", "title_cell_flow.value = \"Discharge (CMS)\"\n", "title_cell_flow.font = title_font\n", "title_cell_flow.fill = title_fill\n", "title_cell_flow.alignment = title_align\n", "\n", "# define time period and time delta of 1 hour = 3600 seconds\n", "current_date_time = datetime.datetime(2040, 12, 24, 0, 0)\n", "dt = datetime.timedelta(seconds=3600)\n", "\n", "# write random discharges to workbooks\n", "for row in ws.iter_rows(min_row=2, max_row=26, min_col=1, max_col=2):\n", " row[0].value = current_date_time\n", " row[0].number_format = date_time_format\n", " row[1].value = np.random.random_sample(size=None) * 100\n", " row[1].number_format = \"0.00\"\n", " current_date_time += dt\n", " \n", "wb.save(\"data/python_workbook_reloaded.xlsx\")\n", "wb.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{figure} ../img/py-xlsx-reloaded.png\n", ":alt: python xlsx styling\n", ":name: py-xlsx-reloaded\n", "\n", "The updated workbook.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The below code block provides the short helper function `read_columns` to read only one or more columns into a (nested) *list* (reads until the maximum number of rows, defined by `ws.rows`, in a workbook is reached). A similar function can be written for reading rows." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def read_columns(ws, start_row=0, columns=\"ABC\"):\n", " return [ws[\"{}{}\".format(column, row)].value for row in range(start_row, int(ws.rows.__sizeof__()) + 1) for column in columns]\n", "\n", "# example usage:\n", "wb = oxl.load_workbook(filename=\"data/python_workbook.xlsx\", read_only=False)\n", "ws = wb.active\n", "col_D = read_columns(ws, start_row=2, columns=\"D\")\n", "col_F = read_columns(ws, start_row=2, columns=\"F\")\n", "wb.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Challenge: Add a random test set to modified-data-wb.xlsx\n", ":class: important, dropdown\n", "The {ref}`pandas file handling section ` features the creation of a workbook containing 4 columns of test data ([download modified-data-wb.xlsx](https://raw.githubusercontent.com/hydro-informatics/jupyter-python-course/main/data/modified-data-wb.xlsx)). For a random-test comparison, you want to add a column of random values. To this end:\n", "\n", "* Open *modified-data-wb.xlsx* with *openpyxl* `wb = oxl.load_workbook(filename=\"data/python_workbook.xlsx\", read_only=False)`\n", "* Get the active worksheet `ws = wb.active`\n", "* Add a new column name in column **F**: `ws[\"F1\"].value = \"Random values\"`\n", "* Create a list (i.e., a 1d array) of random numbers with numpy (do not forget to `import numpy as np`) `rnd_data = np.random.random(18)`\n", "* Iterate over the random data array and write the values to the workbook's **F** column\n", " * Start the iteration with `for row, val in enumerate(rnd_data):`\n", " * In every iteration add the next random value of `rnd_data` with `ws[\"F\" + str(row + 2)].value = val`
*Note the usage of `row + 2` (one header column and different absolutes of Python and the workbook)*\n", "* Save and close the workbook\n", " * `wb.save(os.getcwd() + \"/data/re-modified-data-wb.xlsx\"` (do not forget to `import os`)\n", " * `wb.close()`\n", " * Alternatively, use a namespace for the workbook manipulation!\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Formulae in Workbooks\n", "\n", "The optional keyword argument `data_only=False` enables reading workbook formulae instead of cell values. However, not all workbook formulae are recognized by *openpyxl* and in the case of doubts, a dirty try-and-error approach is the only remedy. As an example, change `SQRT` in the below example to the formula in question." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "from openpyxl.utils import FORMULAE\n", "print(\"SQRT\" in FORMULAE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### (Un)merge Cells\n", "\n", "Merging and un-merging cells is a popular office function for style purposes and *openpyxl* also provides functions to perform merge operations:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "ws.merge_cells(start_row=1, end_row=3, start_column=1, end_column=2)\n", "ws.unmerge_cells(start_row=1, end_row=3, start_column=1, end_column=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Charts (Plots)\n", "\n", "In the unlikely event that you want to insert plots directly into workbooks with Python ({ref}`matplotlib ` is more powerful anyway), *openpyxl* provides features for this purpose as well. To illustrate the creation of an area chart, the below code block re-uses the first column of random values in the above-created `python_workbook.xlsx`. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from openpyxl.chart import AreaChart, Reference, Series\n", "\n", "wb = oxl.load_workbook(filename=\"data/python_workbook.xlsx\", read_only=False)\n", "ws = wb.active\n", "\n", "chart = AreaChart()\n", "chart.title = \"Random Gaussian\"\n", "chart.style = 10\n", "chart.x_axis.title = \"Cell row\"\n", "chart.y_axis.title = \"Random value (-)\"\n", "\n", "col_D = Reference(ws, min_col=4, min_row=2, max_row=20)\n", "col_F = Reference(ws, min_col=6, min_row=2, max_row=20)\n", "\n", "chart.add_data(col_F, titles_from_data=False)\n", "chart.add_data(col_D, titles_from_data=False)\n", "\n", "ws.add_chart(chart, \"B2\")\n", "\n", "wb.save(\"data/python_workbook_chart.xlsx\")\n", "wb.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{figure} ../img/py-xlsx-plot.png\n", ":alt: plot python excel file\n", ":name: py-xlsx-plot\n", "\n", "The workbook with the plot created in Python.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other workbook charts are available and their implementation (still: why would you?) is explained in the [openpyxl docs](https://openpyxl.readthedocs.io/en/stable/charts/introduction.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Customize Workbook Manipulation\n", "There are many ways of modifying workbooks and *openpyxl* provides close-to \"shovel-ready\" methods to manipulate a workbook. Still, to avoid re-reading this lesson every time you want to manipulate a workbook, it is more convenient to have your own workbook manipulation classes ready to work. To this end, the following code block defines custom `Read` and `Write` classes, where `Read` is the parent class of the `Write` class (recall the section on {ref}`inheritance of classes `). The `Read` class may contain tailored functions for reading specific columns, rows, or arrays. The below code block also makes use of the above-defined `read_columns` function, implemented as a method of the `Read` class." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import openpyxl as oxl\n", "\n", "class Read:\n", " def __init__(workbook_name=\"\", *args, **kwargs):\n", " read_only = kwargs.get(\"read_only\")\n", " data_only = kwargs.get(\"data_only\")\n", " sheet_name = kwargs.get(\"data_only\")\n", " self.wb = oxl.load_workbook(filename=workbook_name, read_only=read_only, data_only=data_only)\n", " if sheet_name:\n", " self.ws = self.wb.worksheets[worksheet]\n", " else:\n", " self.ws = self.wb[self.wb.sheetnames[0]]\n", " \n", " def read_columns(self, start_row=0, columns=\"ABC\"):\n", " return [self.ws[\"{}{}\".format(column, row)].value for row in range(start_row, len(self.ws.rows) + 1) for column\n", " in columns]\n", " \n", " def __call__(self):\n", " print(dir(self))\n", "\n", " \n", "class Write(Read):\n", " def __init__(workbook_name=\"\", *args, **kwargs):\n", " data_only = kwargs.get(\"data_only\")\n", " sheet_name = kwargs.get(\"data_only\")\n", " Read.__init__(workbook_name=workbook_name, read_only=False, data_only=data_only, sheet_name=sheet_name)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An extended example script with more complex `Read` and `Write` classes can be downloaded from the [course repository](https://github.com/hydro-informatics/material-py-codes/raw/master/workbooks/xlsx.py).\n", "\n", "```{admonition} Challenge\n", "What are your favorite fonts, table colors, or layouts? Write your own `Read` and `Write` classes with formatting methods to have a personal template ready to be used at any time.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### An Example from Water Resources Engineering and Research\n", "\n", "The ecological restoration or enhancement of rivers requires, among other data, information on preferred water depths and flow velocities of target fish species. This information is established by biologists and then often provided in the shape of so-called [habitat suitability index (HSI)](https://riverarchitect.github.io/RA_wiki/SHArC#hefish) curves in workbook formats. Typically, we produce geospatially explicit data on water depth and flow velocity with numerical models. The output of two or three-dimensional numerical models is way too large to be handled with office applications. So we need an advanced tool, such as Python, to handle the geospatially explicit data, and read and interpolate HSI curves from workbooks. What does that look like technically? The {ref}`exercises on geospatial Python ` will let you dive into aquatic habitat (assessments)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Exercise\n", "Familiarize with workbook handling in the {ref}`Sediment transport (1d) exercise `.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(json)=\n", "## JSON\n", "\n", "JavaScript Object Notation ([JSON](https://www.json.org/json-en.html)) files have a similar structure to XML and enable the structured storage of (human-readable) data. For instance, the numerical code *BASEMENT v.3.x* ({ref}`read more in the numerical modeling chapter `) uses a *model.json* and a *simulation.json* file to store model setup parameters such as material properties. Thus, automating numerical model setups with Python involves the modification of model parameters stored in *json* files. This is where Python steps in with the `JSON` package and `pandas`' JSON modules." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### JSON file structure\n", "\n", "A JSON file consists of two types of data structures, which are *dictionary* objects and *array*s in the form of *lists* of values. The *dictionary* objects in a JSON file correspond to the same format that we already know in Python: Pairs of *keys* (names) and *values* embraced by curly brackets (*braces*) `{\"name\": value}`. The `value` can be a *string*, *numeric*, a comma-separated *list* `[]` (*array*) of data, or another *dictionary*.\n", "The following example shows a JSON file called `river_struct.json` with a `RIVER` key that has a nested dictionary as a value. The value-*dictionary* contains three keys (`NAME`, `GEOMETRY`, and `HYDRAULICS`).\n", "\n", "```{admonition} Take a couple of minutes to understand river_struct.json\n", "* What is the purpose of the `FLOWBOUNDARIES` in `GEOMETRY`?\n", "* How could the `FLOWBOUNDARIES` be related to the `BOUNDARY` key of `HYDRAULICS`?\n", "* What units could the `FRICTION` values correspond to?\n", "* Can you find the river on a map?\n", "```" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'RIVER': {'NAME': 'Vanilla Flow',\n", " 'GEOMETRY': {'REGIONS': [{'type': 'wet', 'name': 'riverbed'},\n", " {'type': 'dry', 'name': 'floodplain'}],\n", " 'FLOWBOUNDARIES': [{'name': 'Inflow', 'nodes': [1, 3, 7, 31]},\n", " {'name': 'Outflow', 'nodes': [89, 90, 76, 69, 95]}]},\n", " 'HYDRAULICS': {'BOUNDARY': [{'discharge_file': '/simulation/directory/Inflow.txt',\n", " 'name': 'Inflow',\n", " 'slope': 0.005,\n", " 'type': 'hydrograph'},\n", " {'name': 'Outflow', 'type': 'zero_gradient'}],\n", " 'FRICTION': {'cobble': 20.0, 'gravel': 26.0, 'sand': 41}},\n", " 'LOCATION': [48.744079, 9.103928]}}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{\n", "\t\"RIVER\": {\n", "\t\t\"NAME\": \"Vanilla Flow\",\n", "\t\t\"GEOMETRY\": {\n", "\t\t\t\"REGIONS\": [\n", "\t\t\t\t{\n", "\t\t\t\t \"type\": \"wet\",\n", "\t\t\t\t \"name\": \"riverbed\"\n", "\t\t\t\t},\n", "\t\t\t\t{\n", "\t\t\t\t \"type\": \"dry\",\n", "\t\t\t\t \"name\": \"floodplain\"\n", "\t\t\t\t}\n", "\t\t\t],\n", "\t\t\t\"FLOWBOUNDARIES\": [\n", "\t\t\t\t{\n", "\t\t\t\t \"name\": \"Inflow\",\n", "\t\t\t\t \"nodes\": [1, 3, 7, 31]\n", "\t\t\t\t},\n", "\t\t\t\t{\n", "\t\t\t\t \"name\": \"Outflow\",\n", "\t\t\t\t \"nodes\": [89, 90, 76, 69, 95]\n", "\t\t\t\t}\n", "\t\t\t]\n", "\t\t},\n", "\t\t\"HYDRAULICS\": {\n", "\t\t\t\"BOUNDARY\": [\n", "\t\t\t\t{\n", "\t\t\t\t\t\"discharge_file\": \"/simulation/directory/Inflow.txt\",\n", "\t\t\t\t\t\"name\": \"Inflow\",\n", "\t\t\t\t\t\"slope\": 0.005,\n", "\t\t\t\t\t\"type\": \"hydrograph\"\n", "\t\t\t\t},\n", "\t\t\t\t{\n", "\t\t\t\t\t\"name\": \"Outflow\",\n", "\t\t\t\t\t\"type\": \"zero_gradient\"\n", "\t\t\t\t}\n", "\t\t\t],\n", "\t\t\t\"FRICTION\": {\n", "\t\t\t\t\"cobble\": 20.0,\n", "\t\t\t\t\"gravel\": 26.0,\n", "\t\t\t\t\"sand\": 41\n", "\t\t\t}\n", "\t\t},\n", "\t\t\"LOCATION\": [48.744079, 9.103928]\n", "\t}\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read (Decode) and Write (Encode) JSON Files with the `json` Library \n", "\n", "JSON files can be implemented in many programming languages, including HTML and Python. This is also why Jupyter notebooks (as used in this eBook) can be run in Python and displayed as a web page. Python has a built-in `json` library that enables JSON decoding and encoding. The `json` library provides a `json.dumps(DATA)` method to \"dump\" (i.e., encode) data in JSON format. Vice versa, the `json.load()` function reads data from JSON files. The following example illustrates encoding and decoding an arbitrarily nested dataset with the `json` library." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[\"list_element1\", {\"dict_key\": [\"tuple_element\", \"text\", 1.0, null]}]\n" ] } ], "source": [ "import json\n", "# create arbitrary nested data (list, dictionary, tuple)\n", "data_for_json = [\"list_element1\", {\"dict_key\": (\"tuple_element\", \"text\", 1.0, None)}]\n", "\n", "# create a json file\n", "json_file = open(\"data/my-first.json\", mode=\"w+\")\n", "# encode the random nested data list in json format and write to file\n", "json_file.write(json.dumps(data_for_json))\n", "# close file\n", "json_file.close()\n", "\n", "# re-open the json file to read data\n", "with open(\"data/my-first.json\", mode=\"r\") as re_opened_file:\n", " raw_data = re_opened_file.readline()\n", "\n", "# decode json data in a Python variable\n", "data_from_json = json.loads(raw_data)\n", "print(json.dumps(data_from_json))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [Python docs](https://docs.python.org/3/library/json.html) provide more options and descriptions on using the `json` library. However, here we will (once again) make use of the *pandas* library, which offers powerful features for handling json data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(read-json)=\n", "### Read (Decode) and Write (Encode) JSON Files with *pandas* \n", "\n", "*pandas* (recall {ref}`data and file handling with pandas `) enables reading JSON files into its convenient table format with an embedded usage of the `json` library. The following code block uses the [pandas.read_json(FILE)](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html) function to read the above shown `RIVER` sample file ([download river_struct.json](https://raw.githubusercontent.com/hydro-informatics/jupyter-python-course/main/data/river_struct.json))." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "river = pd.read_json(\"data/river_struct.json\")\n", "print(river)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since a river without data is like ice cream without taste, we will add (random) data on flow characteristics to the data structure. Let's assume that we have used the data from `river_struct.json` to simulate a stationary discharge in a two-dimensional numerical model. As a result, we have two regular grids (arrays) with data on flow velocity and water depth. Now, we want to append both the flow velocity and water depth arrays in the form of a result structure (dictionary) to `river_struct.json` and give the river a new name." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create random data\n", "import numpy as np\n", "h = np.random.weibull(np.arange(0,100)).reshape(10, 10)\n", "u = np.random.weibull(np.arange(0,100)).reshape(10, 10)\n", "\n", "# append RESULTS row to pandas dataframe\n", "river_dict = river.to_dict()\n", "river_dict[\"RIVER\"].update({\"RESULTS\": {\"water_depth\": h, \"flow_velocity\": u}})\n", "updated_river = pd.DataFrame.from_dict(river_dict)\n", "\n", "# re-NAME RIVER\n", "updated_river[\"RIVER\"][\"NAME\"] = \"Honey river\"\n", "print(updated_river)\n", "\n", "# export to JSON\n", "updated_river.to_json(\"data/river_results.json\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{figure} ../img/py-json-file.png\n", ":alt: python json file creation manipulation\n", ":name: py-json-file\n", "\n", "The resulting workbook.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{admonition} Exercise\n", "Familiarize with JSON file handling in the {ref}`geospatial ecohydraulics ` exercise (requires understanding the full {ref}`geospatial Python chapter `).\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Learning Success Check-up\n", "\n", "Take the [learning success test for this Jupyter notebook](https://forms.gle/9PkUhSdr37aRnxGy5).\n", "\n", "````{admonition} Unfold QR Code\n", ":class: tip, dropdown\n", "\n", "```{image} ../img/qr-codes/gle-pyxml.png\n", "```\n", "````" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }