Title: | Legislative Speeches |
---|---|
Description: | Converts the floor speeches of Uruguayan legislators, extracted from the parliamentary minutes, to tidy data.frame where each observation is the intervention of a single legislator. |
Authors: | Nicolas Schmidt [aut, cre] , Diego Lujan [aut], Juan Andres Moraes [aut], Elina Gomez [ctb] |
Maintainer: | Nicolas Schmidt <[email protected]> |
License: | GPL-3 |
Version: | 0.1.6 |
Built: | 2024-11-19 04:56:06 UTC |
Source: | https://github.com/nicolas-schmidt/speech |
It allows to extract the individual speeches of each legislator in a document and obtain a data.frame.
speech_build( file, add.error.sir = NULL, rm.error.leg = NULL, compiler = FALSE, quality = FALSE, param = list(char = 6500, drop.page = 2) )
speech_build( file, add.error.sir = NULL, rm.error.leg = NULL, compiler = FALSE, quality = FALSE, param = list(char = 6500, drop.page = 2) )
file |
list or character vector specifying the path or URL to a PDF file. It can be one or more files. |
add.error.sir |
character vector. It allows to specify different ways in which
the term that orders the speeches could be miswritten: sir. By default it is |
rm.error.leg |
character vector. It allows to add legislator's names
to be eliminated. By default it is |
compiler |
logical. When the checking of the process of conversion from pdf to data frame
is completed, it is necessary to compile the data frame. To compile implies to unite all the
speeches of each of the legislators for each document. As it is an operation
that must be carried out after making corrections, it is necessary to opt for it.
By default it is |
quality |
logical. If
|
param |
list of length 2 with magnitudes for arguments "character for page" and "drop page non evaluate" respectively. The default values are the median characters of 8500 documents that make up the speech datasets. |
This function converts PDF documents to data.frame. The conversion is
made by seeking interventions of legislators from the word "SENOR". As the
quality of PDF files is not always the best it is recommended to verify that
no legislator is omitted in the data.frame construction process. To make
corrections of the word "SENOR" is that the argument add.error.sir
should be used. The function has a long list of different ways in which
the word "SENOR" may be written in a document, but not all possible future
problems are covered. When the PDF document is a scan that was treated with
an OCR, it should be checked with greater caution to ensure that the operation
was performed correctly.
data.frame class puy
with the following variables:
legislator
: name of the legislators
speech
: speeches by legislators
date
: session date
id
: name file
legislature
: legislature id (period of government)
sex
: sex
chamber
: chamber to which the document belongs.
It can be: Chamber of Representatives, Senate, General Assembly or Permanent Commission.
If quality is TRUE, the following are added:
index_1
: index_1
index_2
: index_2
# url <- speech::speech_url(chamber = "C", from = "17-09-2019", to = "17-09-2019") # out <- speech_build(file = url) # out <- speech_build(file = url, compiler = FALSE, # quality = TRUE, # add.error.sir = c("SEf'IOR"), # rm.error.leg = c("PRtSIDENTE", "SUB", "PRfSlENTE"), # param = list(char = 6000, drop.page = 3)) # out <- list.files(pattern = "*.pdf") %>% speech_build() # out <- list.files(pattern = "*.pdf") %>% # speech_build(., compiler = TRUE, param = list(char = 4500, drop.page = 3))
# url <- speech::speech_url(chamber = "C", from = "17-09-2019", to = "17-09-2019") # out <- speech_build(file = url) # out <- speech_build(file = url, compiler = FALSE, # quality = TRUE, # add.error.sir = c("SEf'IOR"), # rm.error.leg = c("PRtSIDENTE", "SUB", "PRfSlENTE"), # param = list(char = 6000, drop.page = 3)) # out <- list.files(pattern = "*.pdf") %>% speech_build() # out <- list.files(pattern = "*.pdf") %>% # speech_build(., compiler = TRUE, param = list(char = 4500, drop.page = 3))
It allows to check that the names of the legislators are
correctly written before compiling the documents in speech_build
.
speech_check(tidy_speech, initial, expand = FALSE)
speech_check(tidy_speech, initial, expand = FALSE)
tidy_speech |
data.frame. |
initial |
character vector. Initial of the legislators' names. If no initial is entered, all will be checked. |
expand |
logical. If |
list with a data.frame for each initial of legislators' names.
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url) # speech_check(out, initial = c("A", "M"), expand = FALSE)
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url) # speech_check(out, initial = c("A", "M"), expand = FALSE)
allows to modify the legislators' name prior to compiling the data.
speech_legis_replace(tidy_speech, old, new, id = NULL)
speech_legis_replace(tidy_speech, old, new, id = NULL)
tidy_speech |
data.frame class |
old |
old legislator's name. |
new |
new legislator's name. |
id |
id 'floor speech'. |
data.frame.
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url) # speech_check(out, "G") # out <- speech_legis_replace(out, old = "GOI", new = "GONI")
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url) # speech_check(out, "G") # out <- speech_legis_replace(out, old = "GOI", new = "GONI")
It allows to recompile the datasets speech or a data.frame built with
speech_build
to which the variable political party was added.
speech_recompiler( tidy_speech, compiler_by = c("legislator", "legislature", "chamber", "date", "id", "sex") )
speech_recompiler( tidy_speech, compiler_by = c("legislator", "legislature", "chamber", "date", "id", "sex") )
tidy_speech |
data.frame. |
compiler_by |
character vector. Variables for which you may want to recompile the data frame. |
The default compilation is that of \ code speech_build (., compiler = TRUE). This function allows to recompile the data by different levels of aggregation: chamber, legislature or other variables.
data.frame.
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url) # out2 <- speech_recompiler(out) # out2 <- speech_recompiler(out, compiler_by = c("legislator", "legislature", "chamber"))
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url) # out2 <- speech_recompiler(out) # out2 <- speech_recompiler(out, compiler_by = c("legislator", "legislature", "chamber"))
Detects roll-call in floor speeches and converts them to a dataset.
Returns a summary of a rollcall vote object.
speech_rollcall(file, add.error.sir = NULL, rm.error.leg = NULL) ## S3 method for class 'nominal' summary(object, ...)
speech_rollcall(file, add.error.sir = NULL, rm.error.leg = NULL) ## S3 method for class 'nominal' summary(object, ...)
file |
list or character vector specifying the path or URL to a PDF file. It can be one or more files. |
add.error.sir |
character vector. It allows to specify different ways in which
the term that orders the speeches could be miswritten: sir. By default it is |
rm.error.leg |
character vector. It allows to add legislator's names
to be eliminated. By default it is |
object |
an object of class |
... |
additional parameter. |
This function detects roll-call votes on floor speeches. It only detects votes where the vote can be affirmative or negative. This leaves out a set of roll-call votes, such as those for the allocation of positions in the chamber.
data.frame with the following variables:
legislator
: Name of the legislator
vote
: Voting, 1 = affirmative, 0 = Negative
argument
:If the legislator justifies the vote, it is worth 1, otherwise 0.
speech
: Speech
chamber
: Chamber
date
: Date
legislature
: Legislature
rollcall
: Number of roll-call in session
id
: Id
sex
: Sex of legislator
data.frame with the following variables:
Chamber
: Chamber
Date
: Date
Legislators
: Number of legislators in the voting
Affirmative
: Number of affirmative votes
Negative
: Number of negative votes
prop_AF
: Proportion of affirmative votes
prop_NG
: Proportion of negative votes
prop_women
: Proportion of women in the voting
prop_arg
: Proportion of legislators justifying the vote
rc
: Number of roll-call in session
# url <- speech::speech_url(chamber = "D", from = "14-04-2004", to = "14-04-2004") # out <- speech_rollcall(file = url) # summary(out)
# url <- speech::speech_url(chamber = "D", from = "14-04-2004", to = "14-04-2004") # out <- speech_rollcall(file = url) # summary(out)
It allows to undo the compilation of a floor speech.
speech_uncompiler(tidy_speech)
speech_uncompiler(tidy_speech)
tidy_speech |
data. |
data.frame.
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url, compiler = TRUE) # out2 <- speech_uncompiler(out)
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url, compiler = TRUE) # out2 <- speech_uncompiler(out)
Allows to create a vector of url to download within a period within a legislature.
speech_url(chamber, from, to, days = NULL)
speech_url(chamber, from, to, days = NULL)
chamber |
chamber:
|
from |
character vector. Date in YYYY-MM-DD format |
to |
character vector. Date in YYYY-MM-DD format |
days |
character vector. Date in YYYY-MM-DD format. |
character vector
# speech_url(chamber = "D", # from = "2015-02-15", # to = "2015-03-15") # # speech_url(chamber = "D", # from = "2015-02-15", # to = "2015-03-15") # # speech_url(chamber = "D", # days = "2015-02-15") # # speech_url(chamber = "D", # days = c("2002-06-12", "2004-04-14")) #
# speech_url(chamber = "D", # from = "2015-02-15", # to = "2015-03-15") # # speech_url(chamber = "D", # from = "2015-02-15", # to = "2015-03-15") # # speech_url(chamber = "D", # days = "2015-02-15") # # speech_url(chamber = "D", # days = c("2002-06-12", "2004-04-14")) #
Allows to see the legislators' names with problems prior to compiling the data.
speech_view(tidy_speech, legis = character(), view = FALSE)
speech_view(tidy_speech, legis = character(), view = FALSE)
tidy_speech |
data.frame class |
legis |
name of the legislator. |
view |
logical. If |
data.frame.
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url) # speech_view(tidy_speech = out, legis = c("ABDALA", "LAZO"), view = FALSE)
# url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url) # speech_view(tidy_speech = out, legis = c("ABDALA", "LAZO"), view = FALSE)
Word count.
speech_word_count( string, rm.name = FALSE, exclude = NULL, min.char = 0L, rm.long = Inf, rm.num = FALSE, replace.punct = "" )
speech_word_count( string, rm.name = FALSE, exclude = NULL, min.char = 0L, rm.long = Inf, rm.num = FALSE, replace.punct = "" )
string |
character of length equal to or greater than one. |
rm.name |
by default is |
exclude |
words that are to be excluded from counting. |
min.char |
integer that determines the words that have less than a certain number of characters. |
rm.long |
integer that determines the number of characters from which words have to be deleted from the count. |
rm.num |
logical. Indicates whether the numbers in the count will be eliminated. |
replace.punct |
by default is "". |
integer.
vec <- "Hello world!" speech_word_count(vec) vec2 <- "Hello.world!" speech_word_count(vec2) speech_word_count(vec2, replace.punct = " ") vec3 <- "Hello.world!, HelloHelloHelloHelloHelloHello" speech_word_count(vec3, replace.punct = " ", rm.long = 20) speech_word_count("R version", min.char = 1) r <- "R version 3.5.2 (2018-12-20) -- 'Eggshell Igloo'" speech_word_count(r, rm.num = TRUE) speech_word_count(NA) # url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url, compiler = TRUE) # out$word <- speech_word_count(out$speech, rm.name = TRUE) # out$word2 <- speech_word_count(out$speech)
vec <- "Hello world!" speech_word_count(vec) vec2 <- "Hello.world!" speech_word_count(vec2) speech_word_count(vec2, replace.punct = " ") vec3 <- "Hello.world!, HelloHelloHelloHelloHelloHello" speech_word_count(vec3, replace.punct = " ", rm.long = 20) speech_word_count("R version", min.char = 1) r <- "R version 3.5.2 (2018-12-20) -- 'Eggshell Igloo'" speech_word_count(r, rm.num = TRUE) speech_word_count(NA) # url <- "http://bit.ly/35AUVF4" # out <- speech_build(file = url, compiler = TRUE) # out$word <- speech_word_count(out$speech, rm.name = TRUE) # out$word2 <- speech_word_count(out$speech)