Skip to content

simetrikinc/hieroskopia

Repository files navigation

Hieroskopia

codecov

The hiereskopia package is a library to infer properties like date formats or numeric separators in pandas series of type object or string.

Support

Date-times:

  • Support to dates and datetime format
  • This library receive a series as input and try to return a dictionary with the format found in the series Based on the 1989 C (Default) , Snowflake Standard or Java Simple date time format code.

Numeric:

  • This library receive a series as input and try to return a dictionary with the three digit and decimal character separator

Usage

Infer datetime or date

>>> from hieroskopia import InferDatetime
>>> InferDatetime.infer(pd.Series(["2019-11-27",
                     "2019/11/28",
                     "2018-11-08"]))
>>> {'formats': ['%Y-%m-%d', '%Y/%m/%d'], 'type':'date'}

Using return_format parameter

>>> from hieroskopia import InferDatetime
>>> InferDatetime.infer(pd.Series(["2019-11-27",
                     "2019/11/28",
                     "2018-11-08"]), return_format='snowflake')
>>> {'formats': ['yyyy-mm-dd', 'yyyy/mm/dd'], 'type':'date'}
>>> from hieroskopia import InferDatetime
>>> InferDatetime.infer(pd.Series(["2019-11-27",
                     "2019/11/28",
                     "2018-11-08"]), return_format='java')
>>> {'formats': ['yyyy-MM-dd', 'yyyy/MM/dd'], 'type':'date'}

The above method works with a best guess approach to detect a format in a object type series and try to return a datetime.strftime/strptime, Snowflake Date format, Java Simple Date Format format that will cover or parse the majority of the samples.

Infer numeric

>>> from hieroskopia import InferNumeric
>>> InferNumeric.infer(pd.Series(['767313628196.2', '76731362819.546', '767313628196']))
>>> {'three_digit_separator': '', 'decimal_separator': '.', 'type':'float'}

The above method will try to detect and return certain properties in a object type series like datatype, three_digit_separator or decimal_separator character that will cover the majority of the samples.

About

The hiereskopia package is a library to infer properties like date formats or numeric separators in pandas series of type object or string.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Contributors