File Modified Time HWM

class etl_entities.hwm.file.file_mtime_hwm.FileModifiedTimeHWM(*, name: str, description: str = '', directory: AbsolutePath | None = None, value: datetime | None = None, expression: Any = None, modified_time: datetime = None)

HWM based on tracking file modification time.

Warning

This HWM types is not very precise, as some filesystems may have whole second precision, so files created within the same second may be skipped.

Also this HWM should not be used if file modification time can be changed after the file was already handled by previous ETL process run. This could lead to reading the same file twice.

Added in version 2.5.0.

Parameters:
namestr

HWM unique name

valuedatetime.datetime or None, default: None

HWM value

directorypathlib.Path, default: None

Directory for HWM value.

descriptionstr, default: ""

Description of HWM

expressionAny, default: None

HWM expression

modified_timedatetime.datetime, default: current datetime

HWM value modification time

Examples

from datetime import datetime, timezone
from etl_entities.hwm import FileModifiedTimeHWM

hwm = FileModifiedTimeHWM(
    name="hwm_name",
    value=datetime(2025, 1, 1, 11, 22, 33, 456789, tzinfo=timezone.utc),
)
copy(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, update: DictStrAny | None = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters:
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns:

new model instance

covers(value: datetime | int | float | PathWithStats) bool

Return True if input value is already covered by HWM

Examples

>>> from pathlib import Path
>>> from etl_entities.hwm import FileModifiedTimeHWM
>>> hwm = FileModifiedTimeHWM(
...     name="hwm_name",
...     value=datetime(2025, 1, 1, 11, 22, 33, 456789),
... )
>>> hwm.covers(Path("/some/old_file.py"))  # path not in HWM
False
dict(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

json(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = None, models_as_dict: bool = True, **dumps_kwargs: Any) str

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

set_value(value: ValueType | None) HWMType

Replaces current HWM value with the passed one, and return HWM.

Note

Changes HWM value in place instead of returning new one

Returns:
resultHWM

Self

Examples

>>> from etl_entities.hwm import ColumnIntHWM
>>> hwm = ColumnIntHWM(value=1, name="my_hwm")
>>> hwm = hwm.set_value(2)
>>> hwm.value
2
update(value: datetime | int | float | PathWithStats | Iterable[PathWithStats]) FileModifiedTimeHWMType

Updates current HWM value with some implementation-specific logic, and return HWM.

Note

Changes HWM value in place

Returns:
resultFileModifiedTimeHWM

Self

Examples

>>> from pathlib import Path
>>> from etl_entities.hwm import FileModifiedTimeHWM
>>> hwm = FileModifiedTimeHWM(
...     name='hwm_name',
...     value=datetime(2025, 1, 1, 11, 22, 33, 456789),
... )
>>> # old file is already covered
>>> hwm.update(Path("/some/old_file.py")).value
datetime.datetime(2025, 1, 1, 11, 22, 33, 456789, tzinfo=...)