Blog Author Tracking in Google Analytics with R

So your business is blogging to get new traffic and leads, but morale is declining because writers can't easily see the evidence of their work. Google Analytics has many relevant metrics for individual web pages, but doesn't have information identifying the author of a post. This makes aggregating blogs by author a pain, particularly for companies like Moz who likely have hundreds of blog posts around the web. 

Enter R. If you're not using R to analyze Google Analytics data, there are quite a few tutorials out there for getting started. With a few of its packages, we can easily get author-level Google Analytics data.

 

library(tidyverse)
library(rvest)
library(googleAnalyticsR)

 

We start by querying the Google Analytics data we want. Here, the only metrics I want to see are sessions for each blog. Luckily, our url structure is such that every blog has "blog/" in its url.

 

organic <- segment_ga4("organic",segment_id = "gaid::-5")
dim <- dim_filter("landingPagePath","REGEXP","blog/") %>% list() %>% filter_clause_ga4()
pages <- google_analytics(ga_id, #your id here
                 date_range = range, #your date range
                 dimensions = c("landingPagePath"),
                 metrics = c("sessions"),
                 dim_filters = dim,
                 segments = organic)


Google Analytics will return a list of url paths. we want to append them to the host name to get the author name from the url. 

pages <- pages %>% mutate(page = paste0("https://www.alloymagnetic.com",landingPagePath))

To retrieve the the author names, the rvest package comes in handy. The author can be identified by its CSS selector.

pages <- pages %>% mutate(author = map(page,~try(read_html(.) %>% 
                                           html_node("span.author") %>% 
                                           html_text())) %>% unlist())

It's likely that you'll get status errors for some of the pages, particularly if some of your blogs have been unpublished in your date range. 

Finally, we group the landing page traffic by author to get our list of writers.

pages %>% group_by(author) %>% summarise(sessions = sum(sessions),
                                         `# of blogs` = n()) %>%
  filter(!str_detect(author, "404")) %>%
  mutate(author = str_remove(author, " by ")) %>%
  arrange(desc(sessions))

 

This is dummy data

 

We leverage web analytics to empower our clients' content marketing strategies. Find out what we can do for you.