-
Notifications
You must be signed in to change notification settings - Fork 124
feat: newtab-content hourly reporting for reported content #8546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/metadata.yaml
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/metadata.yaml
Outdated
Show resolved
Hide resolved
...fx-data-shared-prod/firefox_desktop_derived/newtab_content_reported_content_v1/metadata.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a corresponding view?
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/query.sql
Outdated
Show resolved
Hide resolved
...fx-data-shared-prod/firefox_desktop_derived/newtab_content_reported_content_v1/metadata.yaml
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/metadata.yaml
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/schema.yaml
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/report_content_live_v1/schema.yaml
Outdated
Show resolved
Hide resolved
…tent_live_v1/schema.yaml Co-authored-by: Sean Rose <[email protected]>
…tent_live_v1/schema.yaml Co-authored-by: Sean Rose <[email protected]>
This comment has been minimized.
This comment has been minimized.
a016767 to
0cfc084
Compare
Integration report for "rename model, create view, update dag, remove newtab ping logic"
|
sean-rose
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+wc
A couple notes about deploying this:
- New DAGs are added as paused by default and need to be manually enabled in the Airflow UI.
- When adding ETLs in DAGs that run frequently it's likely the new ETL would end up running before the artifact deployment process could properly create the table. As a result the ETL would create the table without appropriate settings like partitioning, and then the artifact deployment process would fail because BigQuery doesn't allow changing the partitioning for a table (it has to be dropped and recreated). The easiest way to avoid that is pausing the DAG (works best with catchup enabled), waiting for artifact deployment to create the table, and then re-enabling the DAG. So in this case I'd recommend simply waiting to enable the DAG in the first place until artifact deployment has properly created the table.
| # We reprocess the same day every hour up until 1:00 the following day, to give | ||
| # the live data time to come in | ||
| destination_table: >- | ||
| report_content_live_v1${{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (blocking):
| report_content_live_v1${{ | |
| newtab_content_reported_content_v1${{ |
| FROM | ||
| newtab_content_live_deduped AS e | ||
| CROSS JOIN | ||
| UNNEST(e.events) AS event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (non-blocking): e doesn't make sense to me as an alias here. Since the alias isn't really being used except in the cross join I'd suggest simply removing it.
| FROM | |
| newtab_content_live_deduped AS e | |
| CROSS JOIN | |
| UNNEST(e.events) AS event | |
| FROM | |
| newtab_content_live_deduped | |
| CROSS JOIN | |
| UNNEST(events) AS event |
| bqetl_newtab_hourly: | ||
| default_args: | ||
| depends_on_past: false | ||
| email_on_failure: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (blocking): It's recommended to have emails go to [email protected].
| email_on_failure: true | |
| email: | |
| - [email protected] | |
| - [email protected] | |
| email_on_failure: true |
Description
This PR creates the
report_contenttable utilizing the underlying live tables. This will allow the content team to view same day reporting of what content is being reported by users in order to remove if necessary.This daily job was configured following the example in this model where we are partitioning by day but rerunning that same day's data hourly to pull in any new records.
Related Tickets & Documents
Reviewer, please follow this checklist