Run a custom classification wrangle on the specified column or columns. A classification wrangle must be trained first.
Click here to learn how to use Classify Wrangles in Excel.
SampleParameterswrangles:
- classify:
input: Products
output: Category
model_id: ${model_id}
where: Products = Milk
| | |
|
→
|
Products |
Category |
Rice |
|
Milk |
Dairy |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
✓ |
str, list |
|
model_id |
✓ |
str |
ID of the classification model to be used |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Add or Subtract time from a date
SampleParameterswrangles:
- date_calculator:
input: Date
output: New Date
operation: subtract # Optional default is addition
time_unit: days
time_value: 1
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, pandas timestamp |
|
output |
|
str, pandas timestamp |
|
operation |
|
str |
Date operation, defaults to add. |
time_unit |
|
str |
Time unit for operation, defaults to none. See below for options |
time_value |
|
float |
Time unit value for operation, defaults to none. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
List of time units
- years
- months
- weeks
- days
- hours
- minutes
- seconds
- milliseconds
Filter the dataframe based on the contents.
SampleParameters¶ Filtering a Column
wrangles:
# Select only red fruits
- filter:
input: Color
equal:
- red
| | |
Fruit |
Color |
Apple |
red |
Apple |
green |
Orange |
orange |
Strawberry |
red |
|
→
|
Fruit |
Color |
Apple |
red |
Strawberry |
red |
|
If input is not provided, where must be used. See Using Filters and Where for more information.
Parameter |
Required |
Data Type |
Notes |
input |
|
list |
Name of the column to filter on. If multiple are provided, all must match the criteria. Where parameter must be used if not provided. Defaults to an empty list. |
equal |
|
str, list |
Select rows where the values equal a given value. Defaults to none. |
not_equal |
|
str, list |
Select rows where the values do not equal a given value. Defaults to none. |
is_in |
|
str, list |
Select rows where the values are in a given list. Defaults to none. |
not_in |
|
str, list |
Select rows where the values are not in a given list. Defaults to none. |
greater_than |
|
int, float |
Select rows where the values are greater than a specified value. Does include the value itself. Defaults to none. |
greater_than_equal_to |
|
int, float |
Select rows where the values are greater than a specified value. Does include the value itself. Defaults to none. |
less_than |
|
int, float |
Select rows where the values are less than a specified value. Does not include the value itself. Defaults to none. |
less_than_equal_to |
|
int, float |
Select rows where the values are less than a specified value. Does include the value itself. Defaults to none. |
between |
|
list |
Value or list of values to filter that are in between two parameter values. Defaults to none. |
contains |
|
str |
Select rows where the input contains the value. Allows regular expressions. Defaults to none. |
not_contains |
|
str |
Select rows where the input does not contain the value. Allows regular expressions. Defaults to none. |
is_null |
|
bool |
If true, select all rows where the value is NULL. If false, where is not NULL. Defaults to none. |
where |
|
str |
Use a SQL WHERE clause to filter the data. Input parameter must be used if where is not provided. It is suggested to use where independently of all other parameters except where_params. Defaults to none. |
where_params |
list, obj |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name). |
|
If input is not provided, where must be used. See Using Filters and Where for more information.
Look up data from a saved Lookup Wrangle
Click here to learn how to use Lookup Wrangles in Excel.
SampleParameterswrangles:
- lookup:
input: State
output:
- Abbreviation
model_id: 55555555-5555-5555
| | |
State |
Texas |
New York |
Virginia |
|
→
|
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str |
The input column that contains the keys to be looked up in the Lookup Wrangle |
output |
|
str, list |
Output columns from Lookup Wrangle. If not present a dictionary with output values will be returned. |
model_id |
✓ |
str |
ID of the classification model to be used |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Added v0.5
Apply mathematical calculations to columns. Also called as maths
Note: Spaces within column headers are replaced with an underscore automatically. In order for this wrangle to function properly, this must be taken into account in the input.
SampleParameterswrangles:
- math:
input: sqrt(Values)
output: Square Root
| | |
|
→
|
Values |
Square Root |
4 |
2 |
9 |
3 |
16 |
4 |
|
Math supports the set of operators listed below:
- Logical operators:
&, |, ~
- Comparison operators:
<, <=, ==, !=, >=, >
- Unary arithmetic operators:
-
- Binary arithmetic operators:
+, -, *, /, **, %, <<, >>
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str |
The mathematical expression using column names. e.g. column1 * column2 + column3 |
output |
✓ |
str |
The column to output the results to |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Added v1.6
The Python Wrangle allows executing simple Python commands inline within a recipe. Row values are referenced by the column name and commands are evaluated once per row. Spaces within column names are replaced by underscores (_). Additionally, all columns are available as a dict named kwargs. For more complex Python, use Custom Functions.
Note, this evaluates the python command - be especially cautious including variables from untrusted sources within the command string.
SampleParameterswrangles:
- python:
output: result
command: My_Column.upper()
My Column |
result |
example text |
EXAMPLE TEXT |
Note: The python wrangle replaces spaces in column headers with an underscore so it is important to keep this in mind when writing your commands.
Parameter |
Required |
Data Type |
Notes |
input |
|
str, list |
Name or list of input column(s) to filter the data available to the command. Useful in conjunction with kwargs to target a variable range of columns. |
output |
✓ |
str, list |
Name or list of output column(s). To output multiple columns, return a list of the corresponding length. |
command |
✓ |
str |
Python command. This must return a value. |
The python wrangle supports including parameters which allows you to parameterize a variable of unknown origin safely. For example, if you wanted to slice a string or a list like shown below:
wrangles:
- python:
output: sliced
command: input_column[:i]
i: ${var}
In the above code, i is a variable who's value will be dictated by the user.
Added v0.5
Run a recipe as a wrangle.
SampleParameterswrangles:
- recipe:
name: recipe1.wrgl.yaml
variables: ${variable_to_pass}
The output columns of the Wrangle-Recipe will be added to the dataframe.
Parameter |
Required |
Data Type |
Notes |
name |
✓ |
str |
file name/path of the recipe or the model id |
variables |
|
dict |
A dictionary of variables to pass to the recipe, defaults to empty dictionary |
output_columns |
|
str, list |
Columns to output from the recipe, defaults to none. |
functions |
|
str, list |
A list of functions used in the recipe |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Conform DataFrame to new index with optional filling logic.
SampleParameterswrangles:
- reindex:
index:
- 5
- 4
- 3
- 2
- 1
| | |
|
Product Data |
Material |
1 |
SKF ball brg |
Ceramic |
2 |
brg seal |
Rubber |
3 |
Ball valve |
Brass |
4 |
Ceramic cartridge |
Ceramic |
5 |
Needle Bearing |
Stainless Steel |
|
→
|
|
Product Data |
Material |
5 |
Needle Bearing |
Stainless Steel |
4 |
Ceramic cartridge |
Ceramic |
3 |
Ball valve |
Brass |
2 |
brg seal |
Rubber |
1 |
SKF ball brg |
Ceramic |
|
Parameter |
Required |
Data Type |
Notes |
axis |
|
str, int |
Axis to target. Can be either the axis name (‘index’, ‘columns’) or number (0, 1). |
columns |
|
str, list |
New labels for the columns. Preferably an Index object to avoid duplicating data. |
index |
|
str, list |
New labels for the index. Preferably an Index object to avoid duplicating data. |
lebels |
|
str, list |
New labels / index to conform the axis specified by ‘axis’ to. |
Note: Cannot specify both 'axis' and any of 'index' or 'columns'
Reindex is not compatible with where filtering
Remove all the elements that occur in one list from another.
SampleParameters¶ Removing Words From a Column
wrangles:
- remove_words:
input: Description
to_remove: # To Remove columns must be list
- Materials
- Colours
output: Product
tokenize_to_remove: True
ignore_case: False
| | |
Description |
Materials |
Colours |
Steel Blue Bottle |
['Steel'] |
['Blue'] |
['Steel', 'Blue', 'Bottle'] |
['Steel'] |
['Blue'] |
|
→
|
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
|
str, list |
If omitted, overwrites input |
to_remove |
✓ |
str, list |
Column or list of columns with a list of words to be removed |
tokenize_to_remove |
|
bool |
Tokenize all to_remove inputs, defaults to false |
ignore_case |
|
bool |
Ignore input and to_remove case, defaults to true |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Quick find and replace for simple values. Can use regex in the find field.
Click here to learn how to use Replace Wrangles in Excel.
SampleParameterswrangles:
- replace:
input: Product Data
find: brg
replace: bearing
| | |
Product Data |
SKF ball brg |
brg seal |
|
→
|
Product Data |
SKF ball bearing |
bearing seal |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
|
str, list |
Overwrites input if none given |
find |
✓ |
str |
String or regex pattern to find |
replace |
✓ |
str |
Value to replace the string or regex pattern |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Round numbers in a column to the nearest decimal point of your choosing.
Click here to learn how to use Round Wrangles in Excel.
SampleParameters¶ Rounding a Column
wrangles:
- round:
input: Cost Per Unit
output: Cost Rounded
decimals: 2
| | |
Cost Per Unit |
3.14159 |
2.71828 |
|
→
|
Cost Per Unit |
Cost Rounded |
3.14159 |
3.14 |
2.71828 |
2.72 |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
Name of the input column |
output |
|
str, list |
Overwrites input if none given |
decimals |
|
int |
Number of decimal places to round to, default is zero |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Calculate the similarity of two vectors.
Note: Similarity only works on vectors. In order to produce vectors for a column of strings, use create.embeddings.
SampleParameters¶ Rounding a Column
wrangles:
- create.embeddings:
input: col1
api_key: ${my_key}
output: col1 embeddings
- create.embeddings:
input: col2
api_key: ${my_key}
output: col2 embeddings
- similarity:
input:
- col1 embeddings
- col2 embeddings
output: similarity
method: adjusted cosine
| | |
col1 |
col2 |
col1 embeddings |
col2 embeddings |
similarity |
SKF |
Timken |
[1, 2, 3, 4] |
[4, 3, 2, 1] |
0.158931 |
Ball Bearing |
Roller Bearing |
[5, 6, 7, 8] |
[5, 6, 7, 9] |
0.942437 |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
list |
Two columns of vectors to compare the similarity of. |
output |
✓ |
str |
Name of the output column. |
method |
|
str |
The type of similarity to calculate (cosine or euclidean), defaults to cosine. Adjusted cosine adjusts the default cosine calculation to cover a range of 0-1 for typical comparisons. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
List of methods:
- cosine
- adjusted cosine
- euclidean
¶ Standardize
Run a standardize wrangle. e.g. A wrangle that expands abbreviations. A standardization wrangle must be trained first.
Click here to learn how to use Standardize Wrangles in Excel.
SampleParameterswrangles:
- standardize:
input: Abbrev
output: Abbreviations
model_id: code_here
| | |
|
→
|
Abbreviations |
As Soon As Possible |
Estimated Time of Arrival |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
|
str, list |
If omitted, overwrites input |
model_id |
✓ |
str, list |
The ID of the wrangle to use (do not include 'find' and 'replace') |
case_sensitive |
|
boolean |
Allows the wrangle to be case sensitive if set to True, default is False. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
This does not currently work with objects. If your table contains objects, convert.to_json prior to using SQL.
Apply a SQL command to the current dataframe.
Only SELECT statements are supported - the result will be the output. The current table is called df.
SampleParameterswrangles:
- sql:
command: |
SELECT header1, header2
FROM df
WHERE header1 >= 2
| | |
header1 |
header2 |
header3 |
1 |
a |
x |
2 |
b |
y |
3 |
c |
z |
|
→
|
|
Parameter |
Required |
Data Type |
Notes |
command |
✓ |
str |
SQL Command. The table is called df. For specific SQL syntax, this uses the SQLite dialect. |
Sql is not compatible with where filtering
Translate the input column to another language. Powered by DeepL. A list of language codes can be found here.
Click here to learn how to use Translate Wrangles in Excel.
SampleParameterswrangles:
- translate:
input: Español
output: English
source_language: Spanish
target_language: English (British)
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
✓ |
str, list |
|
target_language |
✓ |
str |
Code of the language to translate to |
source_language |
|
str |
Code of the language to translate from. If omitted, automatically detects the input language |
case |
|
lower/ upper/ title/ sentence |
Allow changing the case of the input prior to translation. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |