14916511482

Committed 08 May 2025 08:55PM UTC coverage: 80.239% (+0.05%) from 80.194%

Build # 14916511482

Build Type

push

github

Committed by

jharwell

Commit Message

feature(#326): Arrow storage

- Start updating docs/code to say "output files" instead of "csv"

- Move flattening to be a platform callback so it can be done before scaffolding
  a batch exp.

- Start hacking at statistics generation to support arrow and CSV. Things seem
  to work with arrow, but need to re-run some imagizing/csv tests to verify
  things aren't broken in other ways.

- Add a placeholder for fleshing out SIERRA's dataflow model, which is a really
  important aspect of usage which currently isn't documented.

- Remove excessive class usage in DataFrame{Reader,Writer}

- Overhaul collation and fix nasty bug where data was only being gathered from 1
  run per sim; no idea how long that has been in there. Added an assert so that
  can't happen again.

Run Details

349 of 385 new or added lines in 28 files covered. (90.65%)

3 existing lines in 3 files now uncovered.

5441 of 6781 relevant lines covered (80.24%)

0.8 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

75.0

/sierra/plugins/storage/arrow/plugin.py

# Copyright 2025 John Harwell, All rights reserved.
#
#  SPDX-License-Identifier: MIT
"""
Plugin for reading/writing apache .arrow files.
"""

# Core packages
import pathlib
import typing as tp

# 3rd party packages
from retry import retry
import pandas as pd

# Project packages


def suffixes() -> tp.Set[str]:
    return {'.arrow'}


@retry(pd.errors.ParserError, tries=10, delay=0.100, backoff=1.1)  # type:ignore
def df_read(path: pathlib.Path, **kwargs) -> pd.DataFrame:
    """
    Read a pandas dataframe from an apache .arrow file.
    """
    return pd.read_feather(path)


@retry(pd.errors.ParserError, tries=10, delay=0.100, backoff=1.1)  # type:ignore
def df_write(df: pd.DataFrame, path: pathlib.Path, **kwargs) -> None:
    """
    Write a pandas dataframe to a apache .arrow file.
    """
    df.to_feather(path)

1	# Copyright 2025 John Harwell, All rights reserved.
2	#
3	# SPDX-License-Identifier: MIT
4	"""
5	Plugin for reading/writing apache .arrow files.
6	"""
7
8	# Core packages
9	import pathlib	1✔
10	import typing as tp	1✔
11
12	# 3rd party packages
13	from retry import retry	1✔
14	import pandas as pd	1✔
15
16	# Project packages
17
18
19	def suffixes() -> tp.Set[str]:	1✔
NEW 20	return {'.arrow'}	×
21
22
23	@retry(pd.errors.ParserError, tries=10, delay=0.100, backoff=1.1) # type:ignore	1✔
24	def df_read(path: pathlib.Path, **kwargs) -> pd.DataFrame:	1✔
25	"""
26	Read a pandas dataframe from an apache .arrow file.
27	"""
NEW 28	return pd.read_feather(path)	×
29
30
31	@retry(pd.errors.ParserError, tries=10, delay=0.100, backoff=1.1) # type:ignore	1✔
32	def df_write(df: pd.DataFrame, path: pathlib.Path, **kwargs) -> None:	1✔
33	"""
34	Write a pandas dataframe to a apache .arrow file.
35	"""
NEW 36	df.to_feather(path)	×

jharwell / sierra / 14916511482

Source File Press 'n' to go to next uncovered line, 'b' for previous

Source File
Press 'n' to go to next uncovered line, 'b' for previous