You are currently viewing Complete Guide: List of Essential BigQuery Commands for Effective Data Analysis

Complete Guide: List of Essential BigQuery Commands for Effective Data Analysis

Here is a list of some common and essential commands and statements used in Google BigQuery:

  1. Creating and Managing Tables:
  • CREATE TABLE: Create a new table.
  • CREATE TABLE AS: Create a new table based on a query result.
  • DELETE: Delete rows from a table.
  • ALTER TABLE: Modify an existing table.
  • DROP TABLE: Delete a table.
  1. Querying Data:
  • SELECT: Retrieve data from one or more tables.
  • JOIN: Combine data from two or more tables.
  • GROUP BY: Group rows based on a specific column.
  • HAVING: Filter grouped results.
  • ORDER BY: Sort query results.
  • LIMIT: Limit the number of rows returned.
  • UNION: Combine the result sets of two or more SELECT statements.
  1. Filtering Data:
  • WHERE: Filter rows based on specified conditions.
  • AND / OR: Combine multiple conditions.
  • IN: Filter data based on a list of values.
  • BETWEEN: Filter data within a specified range.
  • LIKE: Filter data using pattern matching.
  1. Aggregating and Transforming Data:
  • SUM, AVG, MIN, MAX: Aggregate functions.
  • COUNT: Count the number of rows.
  • CASE: Perform conditional transformations.
  • CAST: Convert data types.
  1. Partitioning and Clustering:
  • PARTITION BY: Partition a table into segments for efficient querying.
  • CLUSTER BY: Cluster table data to improve performance.
  1. Managing Functions and Procedures:
  • CREATE FUNCTION: Create a user-defined function.
  • CREATE PROCEDURE: Create a stored procedure.
  • CALL: Execute a stored procedure.
  1. Loading and Exporting Data:
  • LOAD: Load data from a CSV or JSON file into a table.
  • EXPORT DATA: Export table data to a Cloud Storage bucket.
  1. Managing Datasets:
  • CREATE DATASET: Create a new dataset.
  • DELETE DATASET: Delete a dataset.
  • COPY: Copy tables from one dataset to another.
  1. Information and Metadata:
  • SHOW: Display information about datasets, tables, or views.
  • DESCRIBE: Retrieve metadata about a table or column.
  • INFORMATION_SCHEMA: Query metadata and schema information.
  1. Views:
    • CREATE VIEW: Create a virtual table based on a query.
    • DROP VIEW: Delete a view.
  2. User and Access Management:
    • GRANT: Grant permissions to users or groups.
    • REVOKE: Revoke permissions from users or groups.

  1. CREATE TABLE: Create a new table
CREATE TABLE my_dataset.new_table (
    id INT64,
    name STRING,
    age INT64
);

This command creates a new table named new_table in the dataset my_dataset with three columns: id, name, and age.

  1. CREATE TABLE AS: Create a new table based on a query result
CREATE TABLE my_dataset.new_table_filtered AS
SELECT id, name
FROM my_dataset.existing_table
WHERE age > 18;

This command creates a new table named new_table_filtered in my_dataset with the columns id and name, populated with data from existing_table where the age is greater than 18.

  1. DELETE: Delete rows from a table
DELETE FROM my_dataset.my_table
WHERE age < 21;

This command deletes rows from my_table in my_dataset where the age is less than 21.

  1. ALTER TABLE: Modify an existing table
ALTER TABLE my_dataset.my_table
ADD COLUMN email STRING;

This command adds a new column named email to the my_table in my_dataset.

  1. DROP TABLE: Delete a table
DROP TABLE my_dataset.my_table;

This command deletes the table my_table from the dataset my_dataset.

  1. SELECT: Retrieve data from one or more tables
SELECT id, name, age
FROM my_dataset.my_table;

This query retrieves the id, name, and age columns from the my_table in the my_dataset.

  1. JOIN: Combine data from two or more tables
SELECT t1.id, t1.name, t2.salary
FROM my_dataset.table1 AS t1
JOIN my_dataset.table2 AS t2
ON t1.id = t2.id;

This query performs an inner join between table1 and table2 in my_dataset, combining data based on matching id values.

  1. GROUP BY: Group rows based on a specific column
SELECT department, AVG(salary) AS avg_salary
FROM my_dataset.employee
GROUP BY department;

This query groups the data in the employee table by department and calculates the average salary for each group.

  1. HAVING: Filter grouped results
SELECT department, AVG(salary) AS avg_salary
FROM my_dataset.employee
GROUP BY department
HAVING AVG(salary) > 50000;

This query groups the data by department and calculates the average salary, then filters the groups to include only those with an average salary greater than 50,000.

  1. ORDER BY: Sort query results
SELECT name, age
FROM my_dataset.my_table
ORDER BY age DESC;

This query retrieves name and age from my_table and sorts the results by age in descending order.

  1. LIMIT: Limit the number of rows returned
SELECT name, age
FROM my_dataset.my_table
LIMIT 10;

This query retrieves name and age from my_table and limits the result set to the first 10 rows.

  1. UNION: Combine the result sets of two or more SELECT statements
SELECT name, age FROM my_dataset.table1
UNION ALL
SELECT name, age FROM my_dataset.table2;

This query combines the result sets of two separate SELECT statements from table1 and table2 using the UNION ALL operator.

  1. WHERE: Filter rows based on specified conditions
SELECT name, age
FROM my_dataset.my_table
WHERE age > 25;

This query retrieves name and age from my_table where the age is greater than 25.

  1. AND / OR: Combine multiple conditions
SELECT name, age
FROM my_dataset.my_table
WHERE age > 25 AND department = 'Sales';

This query retrieves name and age from my_table where the age is greater than 25 and the department is ‘Sales’.

  1. IN: Filter data based on a list of values
SELECT name, department
FROM my_dataset.my_table
WHERE department IN ('Sales', 'Marketing');

This query retrieves name and department from my_table where the department is either ‘Sales’ or ‘Marketing’.

  1. BETWEEN: Filter data within a specified range
SELECT name, age
FROM my_dataset.my_table
WHERE age BETWEEN 30 AND 40;

This query retrieves name and age from my_table where the age is between 30 and 40.

  1. LIKE: Filter data using pattern matching
SELECT name
FROM my_dataset.my_table
WHERE name LIKE 'J%';

This query retrieves name from my_table where the name starts with ‘J’.

Here are query examples in BigQuery based on the specified criteria for aggregating and transforming data:

  1. SUM, AVG, MIN, MAX: Aggregate functions
SELECT department, SUM(salary) AS total_salary, AVG(salary) AS avg_salary,
    MIN(salary) AS min_salary, MAX(salary) AS max_salary
FROM my_dataset.employee
GROUP BY department;

This query calculates the total salary, average salary, minimum salary, and maximum salary for each department in the employee table.

  1. COUNT: Count the number of rows
SELECT department, COUNT(*) AS employee_count
FROM my_dataset.employee
GROUP BY department;

This query counts the number of employees in each department in the employee table.

  1. CASE: Perform conditional transformations
SELECT name,
    CASE
        WHEN age < 30 THEN 'Young'
        WHEN age >= 30 AND age <= 50 THEN 'Middle-aged'
        ELSE 'Senior'
    END AS age_group
FROM my_dataset.my_table;

This query creates an age_group column based on the age values in the my_table table.

  1. CAST: Convert data types
SELECT name, age, CAST(salary AS STRING) AS salary_str
FROM my_dataset.my_table;

This query converts the salary column to a string data type in the my_table table.

Here are query examples in BigQuery based on the specified criteria for partitioning and clustering:

  1. PARTITION BY: Partition a table into segments for efficient querying
CREATE TABLE my_dataset.partitioned_table
PARTITION BY DATE(timestamp_column)
AS
SELECT *
FROM my_dataset.source_table;

In this query, the partitioned_table is created by partitioning the data from source_table based on the timestamp_column.

  1. CLUSTER BY: Cluster table data to improve performance
CREATE OR REPLACE TABLE my_dataset.clustered_table
CLUSTER BY category_id
AS
SELECT *
FROM my_dataset.source_table;

This query creates or replaces the clustered_table by clustering the data from source_table based on the category_id.

Please replace my_dataset, partitioned_table, source_table, timestamp_column, clustered_table, and category_id with your actual dataset, table names, column names, and criteria in your BigQuery environment.

Here are query examples in BigQuery based on the specified criteria for managing functions and procedures:

  1. CREATE FUNCTION: Create a user-defined function
CREATE FUNCTION my_dataset.my_function(x INT64, y INT64) AS (
    x + y
);

This query creates a user-defined function named my_function in the my_dataset. The function takes two integer parameters x and y, and it returns the sum of x and y.

  1. CREATE PROCEDURE: Create a stored procedure
CREATE OR REPLACE PROCEDURE my_dataset.my_procedure(INOUT total INT64)
BEGIN
    SET total = total * 2;
END;

This query creates or replaces a stored procedure named my_procedure in the my_dataset. The procedure takes an INOUT parameter total, and it doubles the value of total.

  1. CALL: Execute a stored procedure
DECLARE total INT64 DEFAULT 10;
CALL my_dataset.my_procedure(total);
SELECT total AS doubled_total;

This query declares a variable total, then calls the my_procedure procedure in the my_dataset to double the value of total. Finally, it selects and displays the doubled total.

Please replace my_dataset, my_function, my_procedure, and other placeholders with your actual dataset, function name, procedure name, and parameters in your BigQuery environment. Note that user-defined functions and procedures are only available in standard SQL mode in BigQuery.

Here are query examples in BigQuery based on the specified criteria for loading and exporting data:

  1. LOAD: Load data from a CSV or JSON file into a table Let’s assume you have a CSV file named data.csv in a Cloud Storage bucket gs://my_bucket/data/ and you want to load it into a table named my_table in the dataset my_dataset.
-- Load CSV data into a table
LOAD DATA
    INFILE 'gs://my_bucket/data/data.csv'
    INTO TABLE my_dataset.my_table
    FIELD TERMINATED BY ','
    OPTIONS(skip_leading_rows=1);

In this query, the LOAD DATA statement loads CSV data from the specified Cloud Storage path into the my_table table. The skip_leading_rows option is used to skip the header row.

  1. EXPORT DATA: Export table data to a Cloud Storage bucket Let’s assume you want to export data from a table named my_table in the dataset my_dataset to a Cloud Storage bucket gs://my_bucket/exported_data/.
-- Export table data to Cloud Storage
EXPORT DATA
    OPTIONS(
        uri='gs://my_bucket/exported_data/*.csv',
        format='CSV',
        overwrite=true
    )
AS
SELECT *
FROM my_dataset.my_table;

In this query, the EXPORT DATA statement exports data from the my_table table to CSV files in the specified Cloud Storage path. The format parameter specifies the export format, and overwrite indicates that existing files can be overwritten.

Please replace my_dataset, my_table, gs://my_bucket/data/, gs://my_bucket/exported_data/, and other placeholders with your actual dataset, table names, Cloud Storage paths, and options in your BigQuery environment.

Here are query examples in BigQuery based on the specified criteria for managing datasets:

  1. CREATE DATASET: Create a new dataset Let’s assume you want to create a new dataset named my_new_dataset.
-- Create a new dataset
CREATE DATASET my_project.my_new_dataset;

In this query, the CREATE DATASET statement creates a new dataset named my_new_dataset in the my_project project.

  1. DELETE DATASET: Delete a dataset Let’s assume you want to delete the dataset my_old_dataset and all its tables.
-- Delete a dataset and its tables
DELETE DATASET my_project.my_old_dataset
OPTIONS(
    delete_contents=true
);

In this query, the DELETE DATASET statement deletes the dataset my_old_dataset and its tables. The delete_contents option ensures that the dataset and all its contents are deleted.

  1. COPY: Copy tables from one dataset to another Let’s assume you want to copy tables from source_dataset to destination_dataset.
-- Copy tables from one dataset to another
CREATE OR REPLACE TABLE destination_dataset.new_table AS
SELECT *
FROM source_dataset.existing_table;

In this query, the CREATE OR REPLACE TABLE statement creates a new table named new_table in the destination_dataset and copies data from the existing_table in the source_dataset.

Please replace my_project, my_new_dataset, my_old_dataset, source_dataset, destination_dataset, existing_table, and new_table with your actual project, dataset names, and table names in your BigQuery environment.

Here are query examples in BigQuery based on the specified criteria for information and metadata:

  1. SHOW: Display information about datasets, tables, or views Let’s assume you want to display information about all datasets in your project.
-- Show information about datasets
SHOW DATASETS;

This query uses the SHOW DATASETS statement to display information about all datasets in your project.

  1. DESCRIBE: Retrieve metadata about a table or column Let’s assume you want to retrieve metadata about the columns of a table named my_table in the dataset my_dataset.
-- Describe table columns
DESCRIBE my_dataset.my_table;

This query uses the DESCRIBE statement to retrieve metadata about the columns of the my_table table in the my_dataset.

  1. INFORMATION_SCHEMA: Query metadata and schema information Let’s assume you want to query schema information for tables in a dataset.
-- Query schema information using INFORMATION_SCHEMA
SELECT
    table_name, column_name, data_type
FROM
    `my_project.my_dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE
    table_name = 'my_table';

In this query, the INFORMATION_SCHEMA.COLUMNS table is used to query schema information for columns in the specified dataset and table.

Please replace my_project, my_dataset, my_table, and other placeholders with your actual project, dataset, and table names in your BigQuery environment.

Here are query examples in BigQuery based on the specified criteria for creating and managing views:

  1. CREATE VIEW: Create a virtual table based on a query Let’s assume you want to create a view named employee_summary that summarizes data from the employee table.
-- Create a view to summarize employee data
CREATE OR REPLACE VIEW my_dataset.employee_summary AS
SELECT department, COUNT(*) AS total_employees, AVG(salary) AS avg_salary
FROM my_dataset.employee
GROUP BY department;

In this query, the CREATE OR REPLACE VIEW statement creates a view named employee_summary that summarizes employee data using an aggregation query.

  1. DROP VIEW: Delete a view Let’s assume you want to delete the view named employee_summary.
-- Delete a view
DROP VIEW my_dataset.employee_summary;

This query uses the DROP VIEW statement to delete the employee_summary view.

Please replace my_dataset, employee_summary, and other placeholders with your actual dataset and view names in your BigQuery environment.

Here are query examples in BigQuery based on the specified criteria for user and access management:

  1. GRANT: Grant permissions to users or groups Let’s assume you want to grant a user named user@example.com the READ permission on a dataset named my_dataset.
-- Grant READ permission to a user
GRANT READ ON my_dataset TO user@example.com;

In this query, the GRANT statement grants the READ permission to the user user@example.com on the my_dataset dataset.

  1. REVOKE: Revoke permissions from users or groups Let’s assume you want to revoke the READ permission from a user named user@example.com on the dataset my_dataset.
-- Revoke READ permission from a user
REVOKE READ ON my_dataset FROM user@example.com;

This query uses the REVOKE statement to revoke the READ permission from the user user@example.com on the my_dataset dataset.

Please replace my_dataset, user@example.com, and other placeholders with your actual dataset name and user/group identifiers in your BigQuery environment.

Leave a Reply