Allows applying a series of wrangles to the elements of a list individually.
SampleParametersThis example shows how to use convert.case on a list of strings, where normally it would not work on a list.
wrangles:
- accordion:
input: list_column
output: modified_lists
wrangles:
- convert.case:
input: list_column
output: modified_lists
case: upper
| | |
list_column |
["a", "b", "c"] |
["e", "f", "g"] |
|
→
|
list_column |
modified_lists |
["a", "b", "c"] |
["A", "B", "C"] |
["e", "f", "g"] |
["E", "F", "G"] |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
The column(s) containing the list(s) that the wrangles will be applied to the elements of. Note: When accordioning on multiple columns, they must have matching element counts. |
output |
✓ |
str, list |
Output of the wrangles to save back to the dataframe. Note: All columns which are created within the accordion will be dropped if they are not listed in the output. |
wrangles |
✓ |
list |
List of wrangles to apply. |
propagate |
|
str, list |
Limit the column(s) that will be available to the wrangles and replicated for each element. If not specified, all columns will be propogated. This may be useful to limit the memory use for large datasets. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Execute a series of wrangles broken into a series of batches. The batches can optionally be executed in parallel with the threads parameter, and provide an error output to catch errors.
SampleParametersThis example shows how to use batch on an Extract AI Wrangle
wrangles:
- batch:
batch_size: 2
threads: 1
wrangles:
- extract.ai:
api_key: Your OpenAI api key
input: Product Description
output:
Title:
type: string
description: Title of the product
| | |
Product Description |
Sleep better with our Memory Foam Pillow, designed to contour to your head and neck. |
Stay comfortable and stylish with our Organic Cotton T-Shirt, made from soft, breathable fabric. |
Keep drinks hot or cold with our Stainless Steel Water Bottle, featuring durable insulation. |
Enjoy crisp sound and long battery life with our Wireless Bluetooth Earbuds. |
|
→
|
Product Description |
Title |
Sleep better with our Memory Foam Pillow, designed to contour to your head and neck. |
Memory Foam Pillow |
Stay comfortable and stylish with our Organic Cotton T-Shirt, made from soft, breathable fabric. |
Organic Cotton T-Shirt |
Keep drinks hot or cold with our Stainless Steel Water Bottle, featuring durable insulation. |
Stainless Steel Water Bottle |
Enjoy crisp sound and long battery life with our Wireless Bluetooth Earbuds. |
Wireless Bluetooth Earbuds |
|
Parameter |
Required |
Data Type |
Notes |
batch_size |
✓ |
int |
The amount of rows in each batch |
wrangles |
✓ |
dict |
Wrangles to apply to the data (this can be thought of as a sub recipe). |
threads |
|
int |
The amount of batches than are run in parallel |
on_error |
|
dict |
Provides a default output if there is an error within the batch. |
Create a copy of columns in a dataframe.
SampleParameters¶ Copying a Column
wrangles:
- copy:
input: Product Data
output: Product Data (copy)
| | |
Product Data |
SKF ball brg |
brg seal |
|
→
|
Product Data |
Product Data (copy) |
SKF ball bearing |
SKF ball bearing |
bearing seal |
bearing seal |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
✓ |
str, list |
|
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Drop columns within a dataframe.
SampleParameters¶ Dropping a Column
wrangles:
- drop:
columns:
- Material
| | |
Product Data |
Material |
SKF ball brg |
Ceramic |
brg seal |
Rubber |
|
→
|
Product Data |
SKF ball bearing |
bearing seal |
|
Parameter |
Required |
Data Type |
Notes |
columns |
✓ |
str, list |
Column(s) to be droppped. |
Drop is not compatible with where filtering
Explode a column of lists into rows.
SampleParameters¶ Exploding a Column
wrangles:
- explode:
input: Products
| | |
Manufacturer |
Products |
SKF |
[Ball Bearing, Bearing Seal] |
Milwaukee |
[Angle Grinder, Drill, Impact Driver] |
Schneider |
Solid State Relay |
|
→
|
Manufacturer |
Products |
SKF |
Ball Bearing |
SKF |
Bearing Seal |
Milwaukee |
Angle Grinder |
Milwaukee |
Drill |
Milwaukee |
Impact Driver |
Schneider |
Solid State Relay |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
Name of the column(s) to explode. If multiple columns are included they must contain lists of the same length. |
reset_index |
|
bool |
Reset the index after exploding. Default False. |
drop_empty |
|
bool |
Empty lists will not produce a row in the exploded output. Default False. |
Print the current status of the dataframe. Only a sample of rows will be logged.
SampleParameters¶ Logging All Columns to Terminal
wrangles:
- log: {}
¶ Logging Specific Columns to Terminal
wrangles:
- log:
columns:
- column1
- column2
wrangles:
- log:
write:
- file:
name: output/filepath
columns:
- column 1
- column 2
Parameter |
Required |
Data Type |
Notes |
columns |
|
list |
List of specific columns to log. Defaults to all columns. |
write |
|
list |
Allows for an intermediate output to a file/dataframe/database etc. |
error |
|
str |
Log an error to the console. |
warning |
|
str |
Log a warning to the console. |
info |
|
str |
Log info to the console. |
log_data |
|
bool |
Whether to log a sample of the contents of the dataframe. Default True. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
Rename a column or list of columns.
SampleParameterswrangles:
- rename:
input:
- Manufacturer Name
- Manufacturer Part Number
output:
- Manufacturer
- MPN
| | |
Manufacturer Name |
Manufacturer Part Number |
SKF |
302-2 |
Timken |
PF48 |
|
→
|
Manufacturer |
MPN |
SKF |
302-2 |
Timken |
PF48 |
|
Rename is a unique Wrangle that can be used without naming input and output. Simply list the columns to be renamed with their new names seperated by a colon
wrangles:
- rename:
Manufacturer Name: Manufacturer
Manufacturer Part Number: MPN
| | |
Manufacturer Name |
Manufacturer Part Number |
SKF |
302-2 |
Timken |
PF48 |
|
→
|
Manufacturer |
MPN |
SKF |
302-2 |
Timken |
PF48 |
|
Wrangles can be used to rename columns, but they must be used instead of using the standard rename. Simply add wrangles as a parameter, then add the wrangles you wish to use. Note: if using wrangles to rename, a column named 'columns' must be returned.
- rename:
wrangles:
- convert.case:
input: columns
case: upper
| | |
Manufacturer Name |
Manufacturer Part Number |
SKF |
302-2 |
Timken |
PF48 |
|
→
|
MANUFACTURER NAME |
MANUFACTURER PART NUMBER |
SKF |
302-2 |
Timken |
PF48 |
|
Parameter |
Required |
Data Type |
Notes |
input |
|
str, list |
|
output |
|
str, list |
|
wrangles |
|
array |
Use wrangles to transform the column names. The input is named 'columns' and the final result must also include the column named 'columns'. This can only be used instead of the standard rename. |
Rename is not compatible with where filtering
Allows users to sort their data.
SampleParameterswrangles:
- sort:
by: Price
ascending: true
| | |
Item |
Price |
Hammer |
11.99 |
Chisel |
4.99 |
Drill |
29.99 |
Wrench |
6.99 |
Saw |
13.99 |
|
→
|
Item |
Price |
Chisel |
4.99 |
Wrench |
6.99 |
Hammer |
11.99 |
Saw |
13.99 |
Drill |
29.99 |
|
Parameter |
Required |
Data Type |
Notes |
by |
✓ |
str, list |
Name or list of the column(s) to sort by. |
ascending |
|
bool |
Sort ascending vs. descending. Specify a list to sort multiple columns in different orders. If this is a list of bools then it must match the length of the by. |
Transpose a dataframe.
wrangles:
- transpose: {}
| | |
Product Data |
Material |
SKF ball brg |
Ceramic |
brg seal |
Rubber |
|
→
|
|
|
|
Product Data |
SKF ball bearing |
bearing seal |
Material |
Ceramic |
Rubber |
|
Transpose is not compatible with where filtering