bibliometrixExtra: R for synonyms

hackmd-github-sync-badge

:::success Build a customed package bibliometrixExtra (Extended functions for bibliometrix) to put the functions in, that will make the synonyms replacing processing easier. :::

Table of Contents

:::spoiler older info

Need to replace synonyms, but the function termExtraction() in R-bibliometrx can’t work smoothly??

★ replace synonyms by importing csv file syn_replace()

(copy/paste and run this func.) the customed str_replace_multiple can use multiple patterns to be replaced, but it’s not enough. we expect a replacing function can work like in VOSviewr. the syn_replace can do it.

==version 3, it can replace the full terms, and a term inculeds ‘-’ or ' ‘(space).==

Example:

if we want to replace ‘citaion’ as ‘apple’, then

  • ‘cocitaion’ –> ‘cocitaion’
  • ‘co-citaion’ –> ‘co-citaion’
  • ‘co citation’ –> ‘co citation’
  • ‘ciation’ –> ‘apple’
## version 3
syn_replace <- function(df, data, tag){
  data[[tag]] <- gsub(
    ';_', replacement = '; ', 
    x= gsub('^_', replacement = ' ', 
        x= gsub('\\s', replacement = '_', 
            x= gsub('-', replacement = '_-_', data[[tag]] 
    ) ) ) ) 
  for (i in 1:nrow(df)){
    data[[tag]] <- gsub(
      x = data[[tag]] , 
      pattern = paste0( '\\<', df[i, 'pattern'], '\\>' ), 
      replacement = df[i, 'replace'])
  }
  data[[tag]] <- gsub('_-_', replacement = '-', x=data[[tag]] )
  data[[tag]] <- gsub('_', replacement = ' ', x=data[[tag]] )
  return( data[[tag]] )
}

:::

★ Usage

remotes::install_github('tsai-jiewen/bibliometrixExtra')
library(bibliometrixExtra) # syn_export(), syn_replace()
library(bibliometrix)
library(tidyverse)

1. Using syn_export() to export csv from bibliometrix.

(from the customed package bibliometrixExtra )

data(scientometrics, package = "bibliometrixData")

# make a tab-freq table, than export as a csv file
syn_export(
    file = 'test1124.csv',   # export file name
    data = scientometrics,  # the biblio data
    tag = 'ID'               # the field tag  
)

the export file looks like that:

:::spoiler older info

  • delete col A (number)

  • change col B name to ‘pattern’ (MUST!)

  • change col C name to ‘replace’ (MUST!), than delete all numbers. :::

  • ==fill in the terms you want to replace by in col C (replace), one by one.==

  • ==delete the terms no need to change in col B (pattern).==

like this, (for example) change ‘CITATIONS’ and ‘CITATION ANALYSIS’ to ‘APPLE’.

2. Using syn_import() to read the edited csv file into R env.

DTF <- syn_import(file = 'test1124.csv')

check

> DTF
            pattern  replace
1         CITATIONS    APPLE
2 CITATION ANALYSIS    APPLE

3. Using syn_replace() to replace the original synonyms terms.

(from the customed package bibliometrixExtra )

take ‘ID’ for example. you can change to ‘DE’, etc, but remember keep the $ID and tag='ID' as the same.

scientometrics$ID <- syn_replace(
  df = DTF,               # the edited file you import
  data = scientometrics,  # the original biblio data
  tag = 'ID'              # the field tag  
)

:::spoiler older info check before, there were 8 ‘CITATIONS’, 25 ‘CITATION ANALYSIS’ and 0 ‘APPLE’ in the data frame.

> data(scientometrics, package = "bibliometrixData")
> scientometrics$ID %>% 
+   stringr::str_match(pattern ='CITATIONS') %>%
+   table()
.
CITATIONS 
        8 
        
> scientometrics$ID %>% 
+   stringr::str_match(pattern = 'CITATION ANALYSIS') %>%
+   table()
.
CITATION ANALYSIS 
               25 
               
> scientometrics$ID %>% 
+   stringr::str_match(pattern = 'APPLE') %>%
+   table()
< table of extent 0 >

check after, there are zero ‘CITATIONS’ and ‘CITATION ANALYSIS’, but 33 ‘APPLE’ now! It’s successful in replacing!

> scientometrics$ID %>% 
+   stringr::str_match(pattern = 'CITATIONS') %>%
+   table()
< table of extent 0 >

> scientometrics$ID %>% 
+   stringr::str_match(pattern = 'CITATION ANALYSIS') %>%
+   table()
< table of extent 0 >

> scientometrics$ID %>% 
+   stringr::str_match(pattern = 'APPLE') %>%
+   table()
.
APPLE 
   33 

:::

Remark

the term-replacing takes place at the bibliometrix data.frame, so it can continue to be used in the next analysis.

thematicEvolution()

Perform a Thematic Evolution Analysis.

nexus <- thematicEvolution(M,field="DE", years=c(2005, 2010, 2015), n=250, minFreq=2)
plotThematicEvolution(nexus$Nodes,nexus$Edges)

fieldByYear()

Field Tag distribution by Year

fieldByYear(scientometrics, field = "ID", timespan = c(2005,2015), 
            min.freq = 5, n.items = 5, graph = TRUE)


:::spoiler older info (functions)

(older versions) str_replace_multiple

just a liitle bit better than stringr::str_replace_all

str_replace_multiple <- function(pattern_list, data, tag, replacement){
  require(stringr)
  for (i in 1:length(pattern_list)){
    data[[tag]] <- stringr::str_replace_all(
      data[[tag]] , 
      pattern = pattern_list[i], 
      replacement = replacement)
  }
  return(data[[tag]])
}

(older versions) str_replace_df

## version 2
str_replace_df <- function(df, data, tag){
  for (i in 1:nrow(df)){
    data[[tag]] <- gsub(
      x = data[[tag]] , 
      pattern = paste0( '\\<', df[i, 'pattern'], '\\>' ), 
      replacement = df[i, 'replace'])
  }
  return(data[[tag]])
}
## version 1
str_replace_df <- function(df, data, tag){
  require(stringr)
  for (i in 1:nrow(df)){
    data[[tag]] <- stringr::str_replace_all(
      data[[tag]] , 
      pattern = df[i, 'pattern'], 
      replacement = df[i, 'replace'])
  }
  return(data[[tag]])
}

:::

JW Tsai
JW Tsai
PhD student in Education.
comments powered by Disqus
Next
Previous

Related