File Modified Time HWM¶
- class etl_entities.hwm.file.file_mtime_hwm.FileModifiedTimeHWM(*, name: str, description: str = '', directory: AbsolutePath | None = None, value: datetime | None = None, expression: Any = None, modified_time: datetime = None)¶
HWM based on tracking file modification time.
Warning
This HWM types is not very precise, as some filesystems may have whole second precision, so files created within the same second may be skipped.
Also this HWM should not be used if file modification time can be changed after the file was already handled by previous ETL process run. This could lead to reading the same file twice.
Added in version 2.5.0.
- Parameters:
- name
str HWM unique name
- value
datetime.datetimeorNone, default:None HWM value
- directory
pathlib.Path, default:None Directory for HWM value.
- description
str, default:"" Description of HWM
- expressionAny, default:
None HWM expression
- modified_time
datetime.datetime, default: current datetime HWM value modification time
- name
Examples
from datetime import datetime, timezone from etl_entities.hwm import FileModifiedTimeHWM hwm = FileModifiedTimeHWM( name="hwm_name", value=datetime(2025, 1, 1, 11, 22, 33, 456789, tzinfo=timezone.utc), )
- copy(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, update: DictStrAny | None = None, deep: bool = False) Model¶
Duplicate a model, optionally choose which fields to include, exclude and change.
- Parameters:
include – fields to include in new model
exclude – fields to exclude from new model, as with values this takes precedence over include
update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data
deep – set to True to make a deep copy of the model
- Returns:
new model instance
- covers(value: datetime | int | float | PathWithStats) bool¶
Return
Trueif input value is already covered by HWMExamples
>>> from pathlib import Path >>> from etl_entities.hwm import FileModifiedTimeHWM >>> hwm = FileModifiedTimeHWM( ... name="hwm_name", ... value=datetime(2025, 1, 1, 11, 22, 33, 456789), ... ) >>> hwm.covers(Path("/some/old_file.py")) # path not in HWM False
- dict(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny¶
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- json(*, include: AbstractSetIntStr | MappingIntStrAny | None = None, exclude: AbstractSetIntStr | MappingIntStrAny | None = None, by_alias: bool = False, skip_defaults: bool | None = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Callable[[Any], Any] | None = None, models_as_dict: bool = True, **dumps_kwargs: Any) str¶
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
- set_value(value: ValueType | None) HWMType¶
Replaces current HWM value with the passed one, and return HWM.
Note
Changes HWM value in place instead of returning new one
- Returns:
- resultHWM
Self
Examples
>>> from etl_entities.hwm import ColumnIntHWM >>> hwm = ColumnIntHWM(value=1, name="my_hwm") >>> hwm = hwm.set_value(2) >>> hwm.value 2
- update(value: datetime | int | float | PathWithStats | Iterable[PathWithStats]) FileModifiedTimeHWMType¶
Updates current HWM value with some implementation-specific logic, and return HWM.
Note
Changes HWM value in place
- Returns:
- resultFileModifiedTimeHWM
Self
Examples
>>> from pathlib import Path >>> from etl_entities.hwm import FileModifiedTimeHWM >>> hwm = FileModifiedTimeHWM( ... name='hwm_name', ... value=datetime(2025, 1, 1, 11, 22, 33, 456789), ... ) >>> # old file is already covered >>> hwm.update(Path("/some/old_file.py")).value datetime.datetime(2025, 1, 1, 11, 22, 33, 456789, tzinfo=...)