# STATS

The STATS command groups rows by a common value and calculates one or more aggregated values over the grouped rows. You can filter which rows are included in each aggregation using a WHERE clause. If no grouping is specified, the output contains a single row with the aggregation(s) applied over the entire dataset.

## Syntax

`STATS [column1 =] expression1 [WHERE boolean_expression1][, ..., [columnN =] expressionN [WHERE boolean_expressionN]] [BY grouping_expression1[, ..., grouping_expressionN]]`

### Parameters

#### columnX

The name by which the aggregated value is returned. If omitted, the name is equal to the corresponding expression. If multiple columns have the same name, only the rightmost column with this name is used.

#### expressionX

An expression that computes an aggregated value.

#### grouping_expressionX

An expression that outputs the values to group by. If its name coincides with one of the computed columns, that column will be ignored.

#### boolean_expressionX

(Optional) The condition that must be met for a row to be included in the evaluation of expressionX.

Individual `null` values are skipped when computing aggregations.

## Examples

Calculate the count of employees grouped by language:

```esql
FROM employees
| STATS count = COUNT(emp_no) BY languages
| SORT languages
```
Groups employees by language and counts the number in each group.

Return one row with the average number of languages across all employees:

```esql
FROM employees
| STATS avg_lang = AVG(languages)
```
Calculates the average value of the `languages` field for all employees.

Calculate multiple aggregated values:

```esql
FROM employees
| STATS avg_lang = AVG(languages), max_lang = MAX(languages)
```
Calculates both the average and maximum number of languages.

Filter rows for aggregation using WHERE:

```esql
FROM employees
| STATS avg50s = AVG(salary)::LONG WHERE birth_date < "1960-01-01", avg60s = AVG(salary)::LONG WHERE birth_date >= "1960-01-01" BY gender
| SORT gender
```
Calculates average salary for employees born before and after 1960, grouped by gender.

Mix aggregations with and without filters, and optional grouping:

```esql
FROM employees
| EVAL Ks = salary / 1000 // thousands
| STATS under_40K = COUNT(*) WHERE Ks < 40, inbetween = COUNT(*) WHERE 40 <= Ks AND Ks < 60, over_60K  = COUNT(*) WHERE 60 <= Ks, total     = COUNT(*)
```
Counts employees in different salary ranges and totals.

If the grouping key is multivalued, the input row is included in all groups:

```esql
ROW i=1, a=["a", "b"]
| STATS MIN(i) BY a
| SORT a ASC
```
Expands the row into multiple groups for each value in `a`.

Group by multiple values:

```esql
FROM employees
| EVAL hired = DATE_FORMAT("yyyy", hire_date)
| STATS avg_salary = AVG(salary) BY hired, languages.long
| EVAL avg_salary = ROUND(avg_salary)
| SORT hired, languages.long
```
Calculates average salary grouped by hire year and language.

If all grouping keys are multivalued, the input row is included in all groups:

```esql
ROW i=1, a=["a", "b"], b=[2, 3]
| STATS MIN(i) BY a, b
| SORT a ASC, b ASC
```
Expands the row into all combinations of `a` and `b`.

Use functions in aggregating and grouping expressions:

```esql
FROM employees
| STATS avg_salary_change = ROUND(AVG(MV_AVG(salary_change)), 10)
```
Calculates the average salary change, using `MV_AVG` to handle multivalued fields.

Group by an expression, such as the first letter of last name:

```esql
FROM employees
| STATS my_count = COUNT() BY LEFT(last_name, 1)
| SORT `LEFT(last_name, 1)`
```
Counts employees grouped by the first letter of their last name.

Specifying the output column name is optional:

```esql
FROM employees
| STATS AVG(salary)
```
Returns a column named `AVG(salary)`.

Quoting column names with special characters:

```esql
FROM employees
| STATS AVG(salary)
| EVAL avg_salary_rounded = ROUND(`AVG(salary)`)
```
Uses backticks to reference a column with special characters in its name.

## Limitations

- A bug in the STATS command may yield incorrect results when grouping by exactly two keyword fields, where the first field has high cardinality (more than 65,000 distinct values). This issue was introduced in version 8.16.0 and fixed in 8.17.9 and 8.18.7. Workarounds include switching the order of the grouping keys or reducing the cardinality by filtering before STATS.
- Grouping on a single expression is much more optimized than grouping on multiple expressions.
- STATS without any groups is much faster than adding a group.