Generate publications page
I have been updating my publications page by editing markdown for several years. Today I updated my self-citation database, TDH-refs.bib, so that is is consistent with that publications page. In this post we explore the extent to which it would be possible to generate the publications page, using the bib file as a source.
Parse bib into R
Parsing bibtex files is easy using regex. In fact, that is one of the
examples on ?nc::capture_all_str
:
refs.bib <- "~/tdhock.github.io/assets/TDH-refs.bib"
refs.vec <- readLines(refs.bib)
## Warning in readLines(refs.bib): ligne finale incomplète trouvée dans '~/tdhock.github.io/assets/TDH-refs.bib'
at.lines <- grep("^@", refs.vec, value=TRUE)
str(at.lines)
## chr [1:66] "@unpublished{Gurney2024power," "@unpublished{Nguyen2024penalty," "@unpublished{Truong2024circular," ...
The output above shows that there are currently 66 lines that start
with @
in the bib file. Below we use a regex to convert each item
into one row of a data table:
refs.dt <- nc::capture_all_str(
refs.vec,
"@",
type="[^{]+", tolower,
"[{]",
ref="[^,]+",
",\n",
fields="(?:.*\n)+?.*",
"[}]\\s*(?:$|\n)")
str(refs.dt)
## Classes 'data.table' and 'data.frame': 66 obs. of 3 variables:
## $ type : chr "unpublished" "unpublished" "unpublished" "unpublished" ...
## $ ref : chr "Gurney2024power" "Nguyen2024penalty" "Truong2024circular" "Hocking2024mlr3resampling" ...
## $ fields: chr " title={Assessment of the Climate Trace global powerplant CO2 emissions},\n author={Kevin Gurney and Bilal As"| __truncated__ " title={Deep Learning Approach for Changepoint Detection: Penalty Parameter Optimization},\n author={Nguyen, "| __truncated__ " title={Efficient change-point detection for multivariate circular data},\n author={Charles Truong and Toby D"| __truncated__ " title={mlr3resampling: an R implementation of cross-validation for comparing models learned using different t"| __truncated__ ...
## - attr(*, ".internal.selfref")=<externalptr>
The output above shows that the bib file was converted to a table with 66 rows.
Parsing fields
First we look at the number of lines with an equals sign, each of which is probably a field.
eq.lines <- grep("=", refs.vec, value=TRUE)
str(eq.lines)
## chr [1:511] " title={Assessment of the Climate Trace global powerplant CO2 emissions}," ...
Above we see 511 fields.
Below we parse the fields
column:
strip <- function(x)gsub("^\\s*|,\\s*$", "", gsub('[{}"]', "", x))
field.pattern <- list(
"\\s*",
variable="[^= ]+", tolower,
"\\s*=",
value=".*", strip)
(refs.fields <- refs.dt[, nc::capture_all_str(
fields, field.pattern),
by=.(type, ref)])
## type ref variable
## <char> <char> <char>
## 1: unpublished Gurney2024power title
## 2: unpublished Gurney2024power author
## 3: unpublished Gurney2024power note
## 4: unpublished Gurney2024power year
## 5: unpublished Nguyen2024penalty title
## ---
## 507: article Doyon2008heritable number
## 508: article Doyon2008heritable pages
## 509: article Doyon2008heritable year
## 510: article Doyon2008heritable links
## 511: article Doyon2008heritable publisher
## value
## <char>
## 1: Assessment of the Climate Trace global powerplant CO2 emissions
## 2: Kevin Gurney and Bilal Aslam and Pawlok Dass and L Gawuc and Toby Dylan Hocking and Jarrett J Barber and Anna Kato
## 3: Under review at Environmental Research Letters
## 4: 2024
## 5: Deep Learning Approach for Changepoint Detection: Penalty Parameter Optimization
## ---
## 507: 6
## 508: 702--708
## 509: 2008
## 510: [Pubmed](http://www.ncbi.nlm.nih.gov/pubmed/18500334)
## 511: Nature Publishing Group US New York
Above we see 511 fields, consistent with the simpler grep
parsing above.
If it is not consistent, we can use the code below to find out where:
(eq.dt <- nc::capture_first_vec(eq.lines, field.pattern))
## variable
## <char>
## 1: title
## 2: author
## 3: note
## 4: year
## 5: title
## ---
## 507: number
## 508: pages
## 509: year
## 510: links
## 511: publisher
## value
## <char>
## 1: Assessment of the Climate Trace global powerplant CO2 emissions
## 2: Kevin Gurney and Bilal Aslam and Pawlok Dass and L Gawuc and Toby Dylan Hocking and Jarrett J Barber and Anna Kato
## 3: Under review at Environmental Research Letters
## 4: 2024
## 5: Deep Learning Approach for Changepoint Detection: Penalty Parameter Optimization
## ---
## 507: 6
## 508: 702--708
## 509: 2008
## 510: [Pubmed](http://www.ncbi.nlm.nih.gov/pubmed/18500334)
## 511: Nature Publishing Group US New York
eq.dt[!refs.fields, on=.(variable,value)]
## variable value
## <char> <char>
## 1: doi 10.1007/s11222-016-9636-3
eq.counts <- eq.dt[, .(eq.count=.N), by=.(variable,value)]
refs.fields[, .(ref.count=.N), by=.(variable,value)][eq.counts,on=.(variable,value)][eq.count!=ref.count]
## Empty data.table (0 rows and 4 cols): variable,value,ref.count,eq.count
Verify clean
Normally there should not be any quotes or curly braces in fields:
cat(grep('[{}"]', refs.fields$value, value=TRUE), sep="\n\n")
Formatting the parsed data
The publications page is organized as follows
- chronologically, newest on top.
- heading
###
for in progress, and each year. - bullet
-
for each publication.
Each publication has
- Names like Last1 F1, Last2 F2.
- Title then period.
- Venue/publisher.
- then links.
The links at the end are not stored in the bib file, so that is not possible to output. (exercise for reader!)
But we can get the other info.
Venues
library(data.table)
## data.table 1.15.99 IN DEVELOPMENT built 2024-05-02 04:20:26 UTC using 1 threads (see ?getDTthreads). Latest news: r-datatable.com
## **********
## This development version of data.table was built more than 4 weeks ago. Please update: data.table::update_dev_pkg()
## **********
refs.wide <- dcast(refs.fields, type + ref ~ variable)
refs.wide[, .(
type,
year,
journal=substr(journal,1,10),
vol=volume, num=number,
booktitle=substr(booktitle,1,10),
note=substr(note,1,10),
school=substr(school,1,10))]
## type year journal vol num booktitle note school
## <char> <char> <char> <char> <char> <char> <char> <char>
## 1: article 2021 Functional 35 4 <NA> <NA> <NA>
## 2: article 2018 The Americ 103 4 <NA> <NA> <NA>
## 3: article 2022 Journal of 31 4 <NA> <NA> <NA>
## 4: article 2024 Journal of 1 3 <NA> e2024JH000 <NA>
## 5: article 2022 ACM Trans. 29 2 <NA> <NA> <NA>
## 6: article 2016 Clinical C 22 22 <NA> <NA> <NA>
## 7: article 2018 JNCI: Jour 110 10 <NA> <NA> <NA>
## 8: article 2018 Scientific 5 1 <NA> <NA> <NA>
## 9: article 2008 Nature bio 26 6 <NA> <NA> <NA>
## 10: article 2021 Computers 130 <NA> <NA> <NA> <NA>
## 11: article 2010 PLoS one 5 8 <NA> <NA> <NA>
## 12: article 2023 IEEE Robot 8 8 <NA> <NA> <NA>
## 13: article 2023 Journal of 24 70 <NA> <NA> <NA>
## 14: article 2013 BMC Bioinf 14 164 <NA> <NA> <NA>
## 15: article 2013 Journal of 54 <NA> <NA> <NA> <NA>
## 16: article 2014 Bioinforma 30 11 <NA> <NA> <NA>
## 17: article 2017 Bioinforma 33 4 <NA> <NA> <NA>
## 18: article 2019 The R Jour 11 2 <NA> <NA> <NA>
## 19: article 2020 Journal of 21 87 <NA> <NA> <NA>
## 20: article 2021 The R Jour 13 1 <NA> <NA> <NA>
## 21: article 2022 Journal of 101 10 <NA> <NA> <NA>
## 22: article 2023 Computatio 38 <NA> <NA> <NA> <NA>
## 23: article 2019 Biostatist 21 4 <NA> <NA> <NA>
## 24: article 2024 Journal of <NA> <NA> <NA> <NA> <NA>
## 25: article 2021 BMC Bioinf 22 323 <NA> <NA> <NA>
## 26: article 2017 Statistics 27 <NA> <NA> <NA> <NA>
## 27: article 2022 Biology Me 7 1 <NA> <NA> <NA>
## 28: article 2023 Journal of 106 6 <NA> <NA> <NA>
## 29: article 2016 Leukemia 30 7 <NA> <NA> <NA>
## 30: article 2019 Journal of 28 2 <NA> <NA> <NA>
## 31: article 2014 Cancer Sci 105 7 <NA> <NA> <NA>
## 32: article 2023 Nature 618 <NA> <NA> DOI:10.103 <NA>
## 33: article 2024 Nature 627 8002 <NA> <NA> <NA>
## 34: article 2022 Journal of 31 2 <NA> <NA> <NA>
## 35: incollection 2022 <NA> <NA> <NA> Land Carbo <NA> <NA>
## 36: inproceedings 2017 <NA> <NA> <NA> Advances i <NA> <NA>
## 37: inproceedings 2020 <NA> <NA> <NA> 2020 54th <NA> <NA>
## 38: inproceedings 2020 <NA> <NA> <NA> 2020 42nd <NA> <NA>
## 39: inproceedings 2011 <NA> <NA> <NA> 28th inter <NA> <NA>
## 40: inproceedings 2013 <NA> <NA> <NA> Proc. 30th <NA> <NA>
## 41: inproceedings 2015 <NA> <NA> <NA> Proc. 32nd <NA> <NA>
## 42: inproceedings 2020 <NA> 25 <NA> Proc. Paci <NA> <NA>
## 43: inproceedings 2021 <NA> <NA> <NA> 2021 IEEE <NA> <NA>
## 44: inproceedings 2023 <NA> <NA> <NA> 2023 Inter <NA> <NA>
## 45: inproceedings 2022 <NA> <NA> <NA> 2022 Fourt <NA> <NA>
## 46: inproceedings 2022 <NA> <NA> <NA> 2022 fourt <NA> <NA>
## 47: inproceedings 2022 <NA> <NA> <NA> 2022 Fourt <NA> <NA>
## 48: phdthesis 2012 <NA> <NA> <NA> <NA> <NA> Ecole norm
## 49: unpublished 2023 <NA> <NA> <NA> <NA> Preprint a <NA>
## 50: unpublished 2024 <NA> <NA> <NA> <NA> In progres <NA>
## 51: unpublished 2024 <NA> <NA> <NA> <NA> Under revi <NA>
## 52: unpublished 2015 <NA> <NA> <NA> <NA> Preprint a <NA>
## 53: unpublished 2016 <NA> <NA> <NA> <NA> Tutorial a <NA>
## 54: unpublished 2017 <NA> <NA> <NA> <NA> Tutorial a <NA>
## 55: unpublished 2023 <NA> <NA> <NA> <NA> In progres <NA>
## 56: unpublished 2024 <NA> <NA> <NA> <NA> In progres <NA>
## 57: unpublished 2024 <NA> <NA> <NA> <NA> In progres <NA>
## 58: unpublished 2024 <NA> <NA> <NA> <NA> In progres <NA>
## 59: unpublished 2024 <NA> <NA> <NA> <NA> In progres <NA>
## 60: unpublished 2024 <NA> <NA> <NA> <NA> In progres <NA>
## 61: unpublished 2024 <NA> <NA> <NA> <NA> In progres <NA>
## 62: unpublished 2024 <NA> <NA> <NA> <NA> Preprint a <NA>
## 63: unpublished 2023 <NA> <NA> <NA> <NA> Preprint a <NA>
## 64: unpublished 2024 <NA> <NA> <NA> <NA> In progres <NA>
## 65: unpublished 2024 <NA> <NA> <NA> <NA> In progres <NA>
## 66: unpublished 2014 <NA> <NA> <NA> <NA> Preprint a <NA>
## type year journal vol num booktitle note school
As can be seen in the table above, we can use various fields to define the venue of publication:
article
: most journal articles have avolume
andnumber
. The only exception is a recently published article that has not yet been assigned a volume.incollection
andinproceedings
: can usebooktitle
.unpublished
: can usenote
.phdthesis
: can useschool
.
These rules are encoded below,
refs.wide[, venue := fcase(
type=="article", paste0(
journal, ifelse(
is.na(volume),
paste0(", DOI: ", gsub("[.]", " . ", doi)),
paste0(
" ",
volume,
ifelse(is.na(number), "", sprintf("(%s)", number))
)
)
),
type=="inproceedings", booktitle,
type=="incollection", sprintf("Chapter in %s, edited by %s, published by %s", booktitle, editor, publisher),
type=="phdthesis", paste("PHD thesis,", school),
type=="unpublished", note
)][, .(type, year, venue=substr(venue, nchar(venue)-30,nchar(venue)))]
## type year venue
## <char> <char> <char>
## 1: article 2021 Functional Ecology 35(4)
## 2: article 2018 ournal of Human Genetics 103(4)
## 3: article 2022 and Graphical Statistics 31(4)
## 4: article 2024 e Learning and Computation 1(3)
## 5: article 2022 s. Comput.-Hum. Interact. 29(2)
## 6: article 2016 Clinical Cancer Research 22(22)
## 7: article 2018 tional Cancer Institute 110(10)
## 8: article 2018 Scientific data 5(1)
## 9: article 2008 Nature biotechnology 26(6)
## 10: article 2021 ers in Biology and Medicine 130
## 11: article 2010 PLoS one 5(8)
## 12: article 2023 ics and Automation Letters 8(8)
## 13: article 2023 achine Learning Research 24(70)
## 14: article 2013 BMC Bioinformatics 14(164)
## 15: article 2013 rnal of Statistical Software 54
## 16: article 2014 Bioinformatics 30(11)
## 17: article 2017 Bioinformatics 33(4)
## 18: article 2019 The R Journal 11(2)
## 19: article 2020 achine Learning Research 21(87)
## 20: article 2021 The R Journal 13(1)
## 21: article 2022 of Statistical Software 101(10)
## 22: article 2023 Computational Statistics 38
## 23: article 2019 Biostatistics 21(4)
## 24: article 2024 1080/10618600 . 2023 . 2293216
## 25: article 2021 BMC Bioinformatics 22(323)
## 26: article 2017 Statistics and Computing 27
## 27: article 2022 logy Methods and Protocols 7(1)
## 28: article 2023 of Statistical Software 106(6)
## 29: article 2016 Leukemia 30(7)
## 30: article 2019 and Graphical Statistics 28(2)
## 31: article 2014 Cancer Sci 105(7)
## 32: article 2023 Nature 618
## 33: article 2024 Nature 627(8002)
## 34: article 2022 and Graphical Statistics 31(2)
## 35: incollection 2022 published by Taylor and Francis
## 36: inproceedings 2017 formation Processing Systems 30
## 37: inproceedings 2020 Signals, Systems, and Computers
## 38: inproceedings 2020 Medicine Biology Society (EMBC)
## 39: inproceedings 2011 conference on machine learning
## 40: inproceedings 2013 Proc. 30th ICML
## 41: inproceedings 2015 Proc. 32nd ICML
## 42: inproceedings 2020 cific Symposium on Biocomputing
## 43: inproceedings 2021 Reliability Engineering (ISSRE)
## 44: inproceedings 2023 Technology and Computing (IETC)
## 45: inproceedings 2022 Transdisciplinary AI (TransAI)
## 46: inproceedings 2022 transdisciplinary AI (TransAI)
## 47: inproceedings 2022 Transdisciplinary AI (TransAI)
## 48: phdthesis 2012 le normale supérieure de Cachan
## 49: unpublished 2023 er review at BMC Bioinformatics
## 50: unpublished 2024 In progress
## 51: unpublished 2024 Environmental Research Letters
## 52: unpublished 2015 Preprint arXiv:1509.00368
## 53: unpublished 2016 onference, textbook in progress
## 54: unpublished 2017 onference, textbook in progress
## 55: unpublished 2023 In progress
## 56: unpublished 2024 In progress
## 57: unpublished 2024 In progress
## 58: unpublished 2024 In progress
## 59: unpublished 2024 In progress
## 60: unpublished 2024 In progress
## 61: unpublished 2024 In progress
## 62: unpublished 2024 Preprint arXiv:2408.00856
## 63: unpublished 2023 Preprint arXiv:2302.11062
## 64: unpublished 2024 In progress
## 65: unpublished 2024 In progress
## 66: unpublished 2014 Preprint arXiv:1401.8008
## type year venue
Authors
Author names come in two forms:
- Family, Given1 Given2
- Given1 Given2 Family
subject <- c("Toby Dylan Hocking", "Hocking, Toby Dylan")
alt.pattern <- nc::alternatives_with_shared_groups(
family="[A-Z][^,]+",
given="[^,]+",
list("^", given, " ", family, "$"),
list("^", family, ", ", given, "$"))
nc::capture_first_vec(subject, alt.pattern)
## given family
## <char> <char>
## 1: Toby Dylan Hocking
## 2: Toby Dylan Hocking
The pattern above matches either of the two forms. Below we use it to match all of the data.
(authors <- refs.wide[, {
complete <- strsplit(author, split=" and ")[[1]]
data.table(complete, nc::capture_first_vec(
complete,
alt.pattern,
nomatch.error=FALSE))
}, by=ref
][
, abbrev := gsub("[a-z. ]", "", given)
][
, show := ifelse(is.na(family), complete, paste(family, abbrev))
][])
## ref complete given family abbrev show
## <char> <char> <char> <char> <char> <char>
## 1: Abraham2021gut Abraham, Andrew J. Andrew J. Abraham AJ Abraham AJ
## 2: Abraham2021gut Prys-Jones, Tomos O. Tomos O. Prys-Jones TO Prys-Jones TO
## 3: Abraham2021gut De Cuyper, Annelies Annelies De Cuyper A De Cuyper A
## 4: Abraham2021gut Ridenour, Chase Chase Ridenour C Ridenour C
## 5: Abraham2021gut Hempson, Gareth P. Gareth P. Hempson GP Hempson GP
## ---
## 382: Truong2024circular Toby Dylan Hocking Toby Dylan Hocking TD Hocking TD
## 383: Venuto2014support Venuto, D D Venuto D Venuto D
## 384: Venuto2014support Hocking, Toby Dylan Toby Dylan Hocking TD Hocking TD
## 385: Venuto2014support Sphanurattana, L L Sphanurattana L Sphanurattana L
## 386: Venuto2014support Sugiyama, M M Sugiyama M Sugiyama M
The table above shows all names standardized to a common format in the show
column.
Below we verify that all names matched.
authors[is.na(family)]
## Empty data.table (0 rows and 6 cols): ref,complete,given,family,abbrev,show
The table above shows that there are no entries that did not match the regex, which is OK.
abbrev.dt <- authors[, .(
authors_abbrev=paste(show, collapse=", ")
), by=ref]
abbrev.dt[, length(grep("Hocking",authors_abbrev))]
## [1] 66
The output above shows that there are 66 items for which I am listed as an author.
abbrev.dt[, .(ref, authors_abbrev=substr(authors_abbrev,1,30))]
## ref authors_abbrev
## <char> <char>
## 1: Abraham2021gut Abraham AJ, Prys-Jones TO, De
## 2: Alirezaie2018clinpred Alirezaie N, Kernohan KD, Hart
## 3: Barnwal2022jcgs Barnwal A, Cho H, Hocking T
## 4: Bodine2024mapping Bodine CS, Buscombe D, Hocking
## 5: Chaves2022chatbot Chaves AP, Egbert J, Hocking T
## 6: Chicard2016cancer Chicard M, Boyault S, Colmet D
## 7: Depuydt2018genomic Depuydt P, Boeva V, Hocking TD
## 8: Depuydt2018meta Depuydt P, Koster J, Boeva V,
## 9: Doyon2008heritable Doyon Y, McCammon JM, Miller J
## 10: Fotoohinasab2021greedy Fotoohinasab A, Hocking T, Afg
## 11: Gautier2010bayesian Gautier M, Hocking TD, Foulley
## 12: Harshe2023exoskeleton Harshe K, Williams JR, Hocking
## 13: Hillman2023jmlr Hillman J, Hocking TD
## 14: Hocking2013bioinformatics Hocking TD, Schleiermacher G,
## 15: Hocking2013sustainable Hocking TD, Wutzler T, Ponting
## 16: Hocking2014bioinformatics Hocking TD, Boeva V, Rigaill G
## 17: Hocking2017bioinfo Hocking TD, Goerner-Potvin P,
## 18: Hocking2019regex Hocking TD
## 19: Hocking2020jmlr Hocking TD, Rigaill G, Fearnhe
## 20: Hocking2021reshaping Hocking TD
## 21: Hocking2022jss Hocking TD, Rigaill G, Fearnhe
## 22: Hocking2023lopart Hocking TD, Srivastava A
## 23: Jewell2019biostatistics Jewell SW, Hocking TD, Fearnhe
## 24: Kaufman2024functional Kaufman JM, Stenberg AJ, Hocki
## 25: Liehrmann2021chipseq Liehrmann A, Rigaill G, Hockin
## 26: Maidstone2017optimal Maidstone R, Hocking T, Rigail
## 27: Mihaljevic2022sparsemodr Mihaljevic JR, Borkovec S, Rat
## 28: Runge2023jss Runge V, Hocking TD, Romano G,
## 29: Shimada2016leukemia Shimada K, Shimada S, Sugimoto
## 30: Sievert2019jcgs Sievert C, VanderPlas S, Cai J
## 31: Suguro2014cancer Suguro M, Yoshida N, Umino A,
## 32: Tao2023nature Tao F, Huang Y, Hungate BA, Ma
## 33: Tao2024reply Tao F, Houlton BZ, Frey SD, Le
## 34: Vargovich2022breakpoints Vargovich J, Hocking TD
## 35: Hocking22intro Hocking TD
## 36: Drouin2017mmit Drouin A, Hocking T, Laviolett
## 37: Fotoohinasab2020automaticQRSdetectionAsilomar Fotoohinasab A, Hocking T, Afg
## 38: Fotoohinasab2020segmentationEMBC Fotoohinasab A, Hocking T, Afg
## 39: Hocking2011clusterpath Hocking TD, Joulin A, Bach F,
## 40: Hocking2013icml Rigaill G, Hocking T, Vert J-P
## 41: Hocking2015icml Hocking TD, Rigaill G, Bourque
## 42: Hocking2020psb Hocking TD, Bourque G
## 43: Kolla2021fuzz Kolla AC, Groce A, Hocking TD
## 44: Sweeney2023insect Sweeney N, Xu C, Shaw JA, Hock
## 45: barr2022classifying Barr JR, Hocking TD, Morton G,
## 46: barr2022graph Barr JR, Shaw P, Abu-Khzam FN,
## 47: hocking2022interpretable Hocking TD, Barr JR, Thatcher
## 48: Hocking2012phd Hocking TD
## 49: Agyapong2023cv Agyapong D, Propster JR, Marks
## 50: Fowler2024line Fowler J, Hocking TD
## 51: Gurney2024power Gurney K, Aslam B, Dass P, Gaw
## 52: Hocking2015breakpointError Hocking TD
## 53: Hocking2016interactive Hocking TD, Ekstrøm CT
## 54: Hocking2017changepoint Hocking TD, Killick R
## 55: Hocking2023functional Hocking TD
## 56: Hocking2024autism Sutherland V, Hocking TD, Lind
## 57: Hocking2024binsegRcpp Hocking TD
## 58: Hocking2024finite Hocking TD
## 59: Hocking2024hmm Hocking TD
## 60: Hocking2024mlr3resampling Hocking TD
## 61: Hocking2024soak Bodine CS, Thibault G, Arellan
## 62: Nguyen2024penalty Nguyen T, Hocking TD
## 63: Rust2023pairs Rust KR, Hocking TD
## 64: Thibault2024forest Thibault G, Hocking TD, Achim
## 65: Truong2024circular Truong C, Hocking TD
## 66: Venuto2014support Venuto D, Hocking TD, Sphanura
## ref authors_abbrev
The output above shows the abbreviated author list is reasonable.
Count article types
type2long <- c(
article="journal paper",
incollection="book chapter",
inproceedings="conference paper",
phdthesis="PHD thesis",
unpublished="in progress")
refs.wide[, .(count=.N), by=.(Type=type2long[type])][order(-count)]
## Type count
## <char> <int>
## 1: journal paper 34
## 2: in progress 18
## 3: conference paper 12
## 4: book chapter 1
## 5: PHD thesis 1
Output markdown
The code below joins the authors back to the original table, then outputs markdown.
abbrev.wide <- refs.wide[
abbrev.dt, on="ref"
][, let(
heading = ifelse(type=="unpublished", "In progress", year),
citation = sprintf("- %s (%s). %s. %s. %s", authors_abbrev, year, title, venue, links)
)][order(-heading, -year, authors_abbrev)]
abbrev.some <- abbrev.wide[unique(heading)[1:3], .SD[1:2], on="heading", by=heading]
abbrev.some <- abbrev.wide
abbrev.some[
, .(markdown=sprintf("### %s\n%s\n", heading, paste(citation, collapse="\n")))
, by=heading
][
, cat(paste(markdown, collapse="\n"))
]
In progress
- Bodine CS, Thibault G, Arellano PN, Shenkin AF, Lindly O, Hocking TD (2024). SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets. In progress. Software, Reproducible
- Fowler J, Hocking TD (2024). Efficient line search for optimizing Area Under the ROC Curve in gradient descent. In progress. Talk announcement for JSM’23 Toronto, Video of talk at Université Laval July 2023, Slides PDF, source
- Gurney K, Aslam B, Dass P, Gawuc L, Hocking TD, Barber JJ, Kato A (2024). Assessment of the Climate Trace global powerplant CO2 emissions. Under review at Environmental Research Letters. NA
- Hocking TD (2024). Comparing binsegRcpp with other implementations of binary segmentation for changepoint detection. In progress. Reproducible, Software
- Hocking TD (2024). Finite Sample Complexity Analysis of Binary Segmentation. In progress. Reproducible, Software
- Hocking TD (2024). Teaching Hidden Markov Models Using Interactive Data Visualization. In progress. Reproducible, Slides
- Hocking TD (2024). mlr3resampling: an R implementation of cross-validation for comparing models learned using different train subsets. In progress. Software
- Nguyen T, Hocking TD (2024). Deep Learning Approach for Changepoint Detection: Penalty Parameter Optimization. Preprint arXiv:2408.00856. Preprint
- Sutherland V, Hocking TD, Lindly O (2024). Interpretable machine learning algorithms for understanding factors related to childhood autism. In progress. Reproducible
- Thibault G, Hocking TD, Achim A (2024). Predicting forest burn from satellite image data. In progress. NA
- Truong C, Hocking TD (2024). Efficient change-point detection for multivariate circular data. In progress. Reproducible, Software
- Agyapong D, Propster JR, Marks J, Hocking TD (2023). Cross-Validation for Training and Testing Co-occurrence Network Inference Algorithms. Preprint arXiv:2309.15225, under review at BMC Bioinformatics. Preprint
- Hocking TD (2023). Why does functional pruning yield such fast algorithms for optimal changepoint detection?. In progress. Invited talk for TRIPODS seminar video, slides, IEEE NJACS, ASU West ML Day
- Rust KR, Hocking TD (2023). A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss. Preprint arXiv:2302.11062. Preprint
- Hocking TD, Killick R (2017). Introduction to optimal changepoint detection algorithms. Tutorial at international useR 2017 conference, textbook in progress. Conference, Book, Reproducible
- Hocking TD, Ekstrøm CT (2016). Understanding and Creating Interactive Graphics. Tutorial at international useR 2016 conference, textbook in progress. Conference, Manual, Reproducible
- Hocking TD (2015). A breakpoint detection error function for segmentation model selection and validation. Preprint arXiv:1509.00368. Preprint, Software, Reproducible
- Venuto D, Hocking TD, Sphanurattana L, Sugiyama M (2014). Support vector comparison machines. Preprint arXiv:1401.8008. Preprint, Software, Reproducible
2024
- Bodine CS, Buscombe D, Hocking TD (2024). Automated River Substrate Mapping From Sonar Imagery With Machine Learning. Journal of Geophysical Research: Machine Learning and Computation 1(3). DOI, Preprint, Software
- Kaufman JM, Stenberg AJ, Hocking TD (2024). Functional Labeled Optimal Partitioning. Journal of Computational and Graphical Statistics, DOI: 10 . 1080/10618600 . 2023 . 2293216. DOI, Preprint, Software, Reproducible, Preliminary code
- Tao F, Houlton BZ, Frey SD, Lehmann J, Manzoni S, Huang Y, Jiang L, Mishra U, Hungate BA, Schmidt MWI, Reichstein M, Carvalhais N, Ciais P, Wang Y-P, Ahrens B, Hugelius G, Hocking TD, Lu X, Shi Z, Viatkin K, Vargas R, Yigini Y, Omuto C, Malik AA, Peralta G, Cuevas-Corona R, Paolo LED, Luotto I, Liao C, Liang Y-S, Saynes VS, Huang X, Luo Y (2024). Reply to: Model uncertainty obscures major driver of soil carbon. Nature 627(8002). Publisher, Preprint
2023
- Harshe K, Williams JR, Hocking TD, Lerner ZF (2023). Predicting Neuromuscular Engagement to Improve Gait Training With a Robotic Ankle Exoskeleton. IEEE Robotics and Automation Letters 8(8). Publisher
- Hillman J, Hocking TD (2023). Optimizing ROC Curves with a Sort-Based Surrogate Loss for Binary Classification and Changepoint Detection. Journal of Machine Learning Research 24(70). Publisher, Preprint, Software, Reproducible
- Hocking TD, Srivastava A (2023). Labeled optimal partitioning. Computational Statistics 38. Publisher, Preprint, Video, Software, Reproducible
- Runge V, Hocking TD, Romano G, Afghah F, Fearnhead P, Rigaill G (2023). gfpop: An R Package for Univariate Graph-Constrained Change-Point Detection. Journal of Statistical Software 106(6). Publisher, Preprint, Software, GUI, Reproducible
- Sweeney N, Xu C, Shaw JA, Hocking TD, Whitaker BM (2023). Insect Identification in Pulsed Lidar Images Using Changepoint Detection Algorithms. 2023 Intermountain Engineering, Technology and Computing (IETC). Publisher
- Tao F, Huang Y, Hungate BA, Manzoni S, Frey SD, Schmidt MWI, Reichstein M, Carvalhais N, Ciais P, Jiang L, Lehmann J, Mishra U, Hugelius G, Hocking TD, Lu X, Shi Z, Viatkin K, Vargas R, Yigini Y, Omuto C, Malik AA, Perualta G, Cuevas-Corona R, Paolo LED, Luotto I, Liao C, Liang Y-S, Saynes VS, Huang X, Luo Y (2023). Microbial carbon use efficiency promotes global soil carbon storage. Nature 618. Publisher
2022
- Barnwal A, Cho H, Hocking T (2022). Survival Regression with Accelerated Failure Time Model in XGBoost. Journal of Computational and Graphical Statistics 31(4). Publisher, Preprint, Software, Documentation, Video, Reproducible
- Barr JR, Hocking TD, Morton G, Thatcher T, Shaw P (2022). Classifying Imbalanced Data with AUM Loss. 2022 Fourth International Conference on Transdisciplinary AI (TransAI). Publisher
- Barr JR, Shaw P, Abu-Khzam FN, Thatcher T, Hocking TD (2022). Graph embedding: A methodological survey. 2022 fourth international conference on transdisciplinary AI (TransAI). Publisher
- Chaves AP, Egbert J, Hocking T, Doerry E, Gerosa MA (2022). Chatbots Language Design: The Influence of Language Variation on User Experience with Tourist Assistant Chatbots. ACM Trans. Comput.-Hum. Interact. 29(2). Publisher, Preprint
- Hocking TD (2022). Introduction to machine learning and neural networks. Chapter in Land Carbon Cycle Modeling: Matrix Approach, Data Assimilation, and Ecological Forecasting, edited by Yiqi Luo, published by Taylor and Francis. Publisher, My chapter, Reproducible
- Hocking TD, Barr JR, Thatcher T (2022). Interpretable linear models for predicting security vulnerabilities in source code. 2022 Fourth International Conference on Transdisciplinary AI (TransAI). Publisher
- Hocking TD, Rigaill G, Fearnhead P, Bourque G (2022). Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data. Journal of Statistical Software 101(10). Publisher, Software, Reproducible, Slides, Video
- Mihaljevic JR, Borkovec S, Ratnavale S, Hocking TD, Banister KE, Eppinger JE, Hepp C, Doerry E (2022). SPARSEMODr: Rapidly simulate spatially explicit and stochastic models of COVID-19 and other infectious diseases. Biology Methods and Protocols 7(1). Publisher, Preprint, Software
- Vargovich J, Hocking TD (2022). Linear time dynamic programming for computing breakpoints in the regularization path of models selected from a finite set. Journal of Computational and Graphical Statistics 31(2). Publisher, Preprint, Software, Reproducible
2021
- Abraham AJ, Prys-Jones TO, De Cuyper A, Ridenour C, Hempson GP, Hocking T, Clauss M, Doughty CE (2021). Improved estimation of gut passage time considerably affects trait-based dispersal models. Functional Ecology 35(4). Publisher
- Fotoohinasab A, Hocking T, Afghah F (2021). A greedy graph search algorithm based on changepoint analysis for automatic QRS complex detection. Computers in Biology and Medicine 130. Publisher, Preprint
- Hocking TD (2021). Wide-to-tall Data Reshaping Using Regular Expressions and the nc Package. The R Journal 13(1). Publisher, Software, Reproducible
- Kolla AC, Groce A, Hocking TD (2021). Fuzz Testing the Compiled Code in R Packages. 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). Publisher, Software, GitHub Action, Abstract, Video, Blog, Results
- Liehrmann A, Rigaill G, Hocking TD (2021). Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models. BMC Bioinformatics 22(323). Publisher, Reproducible, Software
2020
- Fotoohinasab A, Hocking T, Afghah F (2020). A Graph-Constrained Changepoint Learning Approach for Automatic QRS-Complex Detection. 2020 54th Asilomar Conference on Signals, Systems, and Computers. Publisher, Abstract, Preprint
- Fotoohinasab A, Hocking T, Afghah F (2020). A Graph-constrained Changepoint Detection Approach for ECG Segmentation. 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC). Publisher
- Hocking TD, Bourque G (2020). Machine Learning Algorithms for Simultaneous Supervised Detection of Peaks in Multiple Samples and Cell Types. Proc. Pacific Symposium on Biocomputing. Publisher, Software, Preprint, Reproducible
- Hocking TD, Rigaill G, Fearnhead P, Bourque G (2020). Constrained Dynamic Programming and Supervised Penalty Learning Algorithms for Peak Detection in Genomic Data. Journal of Machine Learning Research 21(87). Publisher, Preprint, Software, Reproducible
2019
- Hocking TD (2019). Comparing namedCapture with other R packages for regular expressions. The R Journal 11(2). Publisher, Software, Reproducible
- Jewell SW, Hocking TD, Fearnhead P, Witten DM (2019). Fast nonconvex deconvolution of calcium imaging data. Biostatistics 21(4). Pubmed
- Sievert C, VanderPlas S, Cai J, Ferris K, Khan FUF, Hocking TD (2019). Extending ggplot2 for Linked and Animated Web Graphics. Journal of Computational and Graphical Statistics 28(2). Publisher, Software, Reproducible, Interactive Figures
2018
- Alirezaie N, Kernohan KD, Hartley T, Majewski J, Hocking TD (2018). ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants. The American Journal of Human Genetics 103(4). DOI, Software, used in dbNSFP
- Depuydt P, Boeva V, Hocking TD, Cannoodt R, Ambros IM, Ambros PF, Asgharzadeh S, Attiyeh EF, Combaret Vé, Defferrari R, Fischer M, Hero B, Hogarty MD, Irwin MS, Koster J, Kreissman S, Ladenstein R, Lapouble E, Laureys Gè, London WB, Mazzocco K, Nakagawara A, Noguera R, Ohira M, Park JR, Pötschger U, Theissen J, Tonini GP, Valteau-Couanet D, Varesio L, Versteeg R, Speleman F, Maris JM, Schleiermacher G, Preter KD (2018). Genomic amplifications and distal 6q loss: novel markers for poor survival in high-risk neuroblastoma patients. JNCI: Journal of the National Cancer Institute 110(10). Publisher
- Depuydt P, Koster J, Boeva V, Hocking TD, Speleman F, Schleiermacher G, De Preter K (2018). Meta-mining of copy number profiles of high-risk neuroblastoma tumors. Scientific data 5(1). Publisher
2017
- Drouin A, Hocking T, Laviolette F (2017). Maximum Margin Interval Trees. Advances in Neural Information Processing Systems 30. Publisher, Software, Reproducible, Preprint, Video
- Hocking TD, Goerner-Potvin P, Morin A, Shao X, Pastinen T, Bourque G (2017). Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning. Bioinformatics 33(4). Pubmed, Software, Reproducible, Data
- Maidstone R, Hocking T, Rigaill G, Fearnhead P (2017). On optimal multiple changepoint algorithms for large data. Statistics and Computing 27. Publisher, Software, Reproducible
2016
- Chicard M, Boyault S, Colmet Daage L, Richer W, Gentien D, Pierron G, Lapouble E, Bellini A, Clement N, Iacono I, Bréjon Sé, Carrere M, Reyes Cé, Hocking T, Bernard V, Peuchmaur M, Corradini Nè, Faure-Conter Cé, Coze C, Plantaz D, Defachelles AS, Thebaud E, Gambart M, Millot Féé, Valteau-Couanet D, Michon J, Puisieux A, Delattre O, Combaret Vé, Schleiermacher G (2016). Genomic Copy Number Profiling Using Circulating Free Tumor DNA Highlights Heterogeneity in Neuroblastoma. Clinical Cancer Research 22(22). Publisher
- Shimada K, Shimada S, Sugimoto K, Nakatochi M, Suguro M, Hirakawa A, Hocking TD, Takeuchi I, Tokunaga T, Takagi Y, Sakamoto A, Aoki T, Naoe T, Nakamura S, Hayakawa F, Seto M, Tomita A, Kiyoi H (2016). Development and analysis of patient-derived xenograft mouse models in intravascular large B-cell lymphoma. Leukemia 30(7). Pubmed
2015
- Hocking TD, Rigaill G, Bourque G (2015). PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data. Proc. 32nd ICML. Publisher, Video, Software, Reproducible
2014
- Hocking TD, Boeva V, Rigaill G, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Richer W, Bourdeaut F, Suguro M, Seto M, Bach F, Vert JP (2014). SegAnnDB: interactive Web-based genomic segmentation. Bioinformatics 30(11). Pubmed, Software, Reproducible, Preprint, Package, Reproducible
- Suguro M, Yoshida N, Umino A, Kato H, Tagawa H, Nakagawa M, Fukuhara N, Karnan S, Takeuchi I, Hocking TD, Arita K, Karube K, Tsuzuki S, Nakamura S, Kinoshita T, Seto M (2014). Clonal heterogeneity of lymphoid malignancies correlates with poor prognosis. Cancer Sci 105(7). Pubmed
2013
- Hocking TD, Schleiermacher G, Janoueix-Lerosey I, Boeva V, Cappo J, Delattre O, Bach F, Vert J-P (2013). Learning smoothing models of copy number profiles using breakpoint annotations. BMC Bioinformatics 14(164). Publisher, Software, Reproducible
- Hocking TD, Wutzler T, Ponting K, Grosjean P (2013). Sustainable, Extensible Documentation Generation using inlinedocs. Journal of Statistical Software 54. Publisher, Software, Reproducible
- Rigaill G, Hocking T, Vert J-P, Bach F (2013). Learning sparse penalties for change-point detection using max margin interval regression. Proc. 30th ICML. Publisher, Video, Software, Reproducible
2012
- Hocking TD (2012). Learning algorithms and statistical software, with applications to bioinformatics. PHD thesis, Ecole normale supérieure de Cachan. Publisher, Reproducible
2011
- Hocking TD, Joulin A, Bach F, Vert J-P (2011). Clusterpath an algorithm for clustering using convex fusion penalties. 28th international conference on machine learning. Publisher, Video, Software, Reproducible, Cited in Ihaka lecture
2010
- Gautier M, Hocking TD, Foulley J-L (2010). A Bayesian outlier criterion to detect SNPs under selection in large data sets. PLoS one 5(8). Publisher
2008
- Doyon Y, McCammon JM, Miller JC, Faraji F, Ngo C, Katibah GE, Amora R, Hocking TD, Zhang L, Rebar EJ, Gregory PD, Urnov FD, Amacher SL (2008). Heritable targeted gene disruption in zebrafish using designed zinc-finger nucleases. Nature biotechnology 26(6). Pubmed NULL
Another output: markdown table with images
One advantage of this is that we can easily modify output formats.
for(large.png in Sys.glob("../assets/img/publications/*.png")){
thumb.png <- file.path(
dirname(large.png),
"thumb",
basename(large.png))
if(!file.exists(thumb.png)){
dir.create(dirname(thumb.png))
convert.cmd <- paste("convert", large.png, "-scale 150", thumb.png)
print(convert.cmd)
system(convert.cmd)
}
}
some.out <- abbrev.some[
, figure.png := sprintf("/assets/img/publications/thumb/%s.png", ref)
][, .(
Figure=ifelse(
file.exists(file.path("..", figure.png)),
sprintf('<img src="%s" width="150" />', figure.png),
""),
Published=heading,
Authors=authors_abbrev,
Title=title,
Venue=venue,
Links=ifelse(is.na(links), "", links)
)]
knitr::kable(some.out)
Figure | Published | Authors | Title | Venue | Links |
---|---|---|---|---|---|
In progress | Bodine CS, Thibault G, Arellano PN, Shenkin AF, Lindly O, Hocking TD | SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets | In progress | Software, Reproducible | |
In progress | Fowler J, Hocking TD | Efficient line search for optimizing Area Under the ROC Curve in gradient descent | In progress | Talk announcement for JSM’23 Toronto, Video of talk at Université Laval July 2023, Slides PDF, source | |
In progress | Gurney K, Aslam B, Dass P, Gawuc L, Hocking TD, Barber JJ, Kato A | Assessment of the Climate Trace global powerplant CO2 emissions | Under review at Environmental Research Letters | ||
In progress | Hocking TD | Comparing binsegRcpp with other implementations of binary segmentation for changepoint detection | In progress | Reproducible, Software | |
In progress | Hocking TD | Finite Sample Complexity Analysis of Binary Segmentation | In progress | Reproducible, Software | |
In progress | Hocking TD | Teaching Hidden Markov Models Using Interactive Data Visualization | In progress | Reproducible, Slides | |
In progress | Hocking TD | mlr3resampling: an R implementation of cross-validation for comparing models learned using different train subsets | In progress | Software | |
In progress | Nguyen T, Hocking TD | Deep Learning Approach for Changepoint Detection: Penalty Parameter Optimization | Preprint arXiv:2408.00856 | Preprint | |
In progress | Sutherland V, Hocking TD, Lindly O | Interpretable machine learning algorithms for understanding factors related to childhood autism | In progress | Reproducible | |
In progress | Thibault G, Hocking TD, Achim A | Predicting forest burn from satellite image data | In progress | ||
In progress | Truong C, Hocking TD | Efficient change-point detection for multivariate circular data | In progress | Reproducible, Software | |
In progress | Agyapong D, Propster JR, Marks J, Hocking TD | Cross-Validation for Training and Testing Co-occurrence Network Inference Algorithms | Preprint arXiv:2309.15225, under review at BMC Bioinformatics | Preprint | |
In progress | Hocking TD | Why does functional pruning yield such fast algorithms for optimal changepoint detection? | In progress | Invited talk for TRIPODS seminar video, slides, IEEE NJACS, ASU West ML Day | |
In progress | Rust KR, Hocking TD | A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss | Preprint arXiv:2302.11062 | Preprint | |
In progress | Hocking TD, Killick R | Introduction to optimal changepoint detection algorithms | Tutorial at international useR 2017 conference, textbook in progress | Conference, Book, Reproducible | |
In progress | Hocking TD, Ekstrøm CT | Understanding and Creating Interactive Graphics | Tutorial at international useR 2016 conference, textbook in progress | Conference, Manual, Reproducible | |
In progress | Hocking TD | A breakpoint detection error function for segmentation model selection and validation | Preprint arXiv:1509.00368 | Preprint, Software, Reproducible | |
In progress | Venuto D, Hocking TD, Sphanurattana L, Sugiyama M | Support vector comparison machines | Preprint arXiv:1401.8008 | Preprint, Software, Reproducible | |
2024 | Bodine CS, Buscombe D, Hocking TD | Automated River Substrate Mapping From Sonar Imagery With Machine Learning | Journal of Geophysical Research: Machine Learning and Computation 1(3) | DOI, Preprint, Software | |
2024 | Kaufman JM, Stenberg AJ, Hocking TD | Functional Labeled Optimal Partitioning | Journal of Computational and Graphical Statistics, DOI: 10 . 1080/10618600 . 2023 . 2293216 | DOI, Preprint, Software, Reproducible, Preliminary code | |
2024 | Tao F, Houlton BZ, Frey SD, Lehmann J, Manzoni S, Huang Y, Jiang L, Mishra U, Hungate BA, Schmidt MWI, Reichstein M, Carvalhais N, Ciais P, Wang Y-P, Ahrens B, Hugelius G, Hocking TD, Lu X, Shi Z, Viatkin K, Vargas R, Yigini Y, Omuto C, Malik AA, Peralta G, Cuevas-Corona R, Paolo LED, Luotto I, Liao C, Liang Y-S, Saynes VS, Huang X, Luo Y | Reply to: Model uncertainty obscures major driver of soil carbon | Nature 627(8002) | Publisher, Preprint | |
2023 | Harshe K, Williams JR, Hocking TD, Lerner ZF | Predicting Neuromuscular Engagement to Improve Gait Training With a Robotic Ankle Exoskeleton | IEEE Robotics and Automation Letters 8(8) | Publisher | |
2023 | Hillman J, Hocking TD | Optimizing ROC Curves with a Sort-Based Surrogate Loss for Binary Classification and Changepoint Detection | Journal of Machine Learning Research 24(70) | Publisher, Preprint, Software, Reproducible | |
2023 | Hocking TD, Srivastava A | Labeled optimal partitioning | Computational Statistics 38 | Publisher, Preprint, Video, Software, Reproducible | |
2023 | Runge V, Hocking TD, Romano G, Afghah F, Fearnhead P, Rigaill G | gfpop: An R Package for Univariate Graph-Constrained Change-Point Detection | Journal of Statistical Software 106(6) | Publisher, Preprint, Software, GUI, Reproducible | |
2023 | Sweeney N, Xu C, Shaw JA, Hocking TD, Whitaker BM | Insect Identification in Pulsed Lidar Images Using Changepoint Detection Algorithms | 2023 Intermountain Engineering, Technology and Computing (IETC) | Publisher | |
2023 | Tao F, Huang Y, Hungate BA, Manzoni S, Frey SD, Schmidt MWI, Reichstein M, Carvalhais N, Ciais P, Jiang L, Lehmann J, Mishra U, Hugelius G, Hocking TD, Lu X, Shi Z, Viatkin K, Vargas R, Yigini Y, Omuto C, Malik AA, Perualta G, Cuevas-Corona R, Paolo LED, Luotto I, Liao C, Liang Y-S, Saynes VS, Huang X, Luo Y | Microbial carbon use efficiency promotes global soil carbon storage | Nature 618 | Publisher | |
2022 | Barnwal A, Cho H, Hocking T | Survival Regression with Accelerated Failure Time Model in XGBoost | Journal of Computational and Graphical Statistics 31(4) | Publisher, Preprint, Software, Documentation, Video, Reproducible | |
2022 | Barr JR, Hocking TD, Morton G, Thatcher T, Shaw P | Classifying Imbalanced Data with AUM Loss | 2022 Fourth International Conference on Transdisciplinary AI (TransAI) | Publisher | |
2022 | Barr JR, Shaw P, Abu-Khzam FN, Thatcher T, Hocking TD | Graph embedding: A methodological survey | 2022 fourth international conference on transdisciplinary AI (TransAI) | Publisher | |
2022 | Chaves AP, Egbert J, Hocking T, Doerry E, Gerosa MA | Chatbots Language Design: The Influence of Language Variation on User Experience with Tourist Assistant Chatbots | ACM Trans. Comput.-Hum. Interact. 29(2) | Publisher, Preprint | |
2022 | Hocking TD | Introduction to machine learning and neural networks | Chapter in Land Carbon Cycle Modeling: Matrix Approach, Data Assimilation, and Ecological Forecasting, edited by Yiqi Luo, published by Taylor and Francis | Publisher, My chapter, Reproducible | |
2022 | Hocking TD, Barr JR, Thatcher T | Interpretable linear models for predicting security vulnerabilities in source code | 2022 Fourth International Conference on Transdisciplinary AI (TransAI) | Publisher | |
2022 | Hocking TD, Rigaill G, Fearnhead P, Bourque G | Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data | Journal of Statistical Software 101(10) | Publisher, Software, Reproducible, Slides, Video | |
2022 | Mihaljevic JR, Borkovec S, Ratnavale S, Hocking TD, Banister KE, Eppinger JE, Hepp C, Doerry E | SPARSEMODr: Rapidly simulate spatially explicit and stochastic models of COVID-19 and other infectious diseases | Biology Methods and Protocols 7(1) | Publisher, Preprint, Software | |
2022 | Vargovich J, Hocking TD | Linear time dynamic programming for computing breakpoints in the regularization path of models selected from a finite set | Journal of Computational and Graphical Statistics 31(2) | Publisher, Preprint, Software, Reproducible | |
2021 | Abraham AJ, Prys-Jones TO, De Cuyper A, Ridenour C, Hempson GP, Hocking T, Clauss M, Doughty CE | Improved estimation of gut passage time considerably affects trait-based dispersal models | Functional Ecology 35(4) | Publisher | |
2021 | Fotoohinasab A, Hocking T, Afghah F | A greedy graph search algorithm based on changepoint analysis for automatic QRS complex detection | Computers in Biology and Medicine 130 | Publisher, Preprint | |
2021 | Hocking TD | Wide-to-tall Data Reshaping Using Regular Expressions and the nc Package | The R Journal 13(1) | Publisher, Software, Reproducible | |
2021 | Kolla AC, Groce A, Hocking TD | Fuzz Testing the Compiled Code in R Packages | 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) | Publisher, Software, GitHub Action, Abstract, Video, Blog, Results | |
2021 | Liehrmann A, Rigaill G, Hocking TD | Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models | BMC Bioinformatics 22(323) | Publisher, Reproducible, Software | |
2020 | Fotoohinasab A, Hocking T, Afghah F | A Graph-Constrained Changepoint Learning Approach for Automatic QRS-Complex Detection | 2020 54th Asilomar Conference on Signals, Systems, and Computers | Publisher, Abstract, Preprint | |
2020 | Fotoohinasab A, Hocking T, Afghah F | A Graph-constrained Changepoint Detection Approach for ECG Segmentation | 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC) | Publisher | |
2020 | Hocking TD, Bourque G | Machine Learning Algorithms for Simultaneous Supervised Detection of Peaks in Multiple Samples and Cell Types | Proc. Pacific Symposium on Biocomputing | Publisher, Software, Preprint, Reproducible | |
2020 | Hocking TD, Rigaill G, Fearnhead P, Bourque G | Constrained Dynamic Programming and Supervised Penalty Learning Algorithms for Peak Detection in Genomic Data | Journal of Machine Learning Research 21(87) | Publisher, Preprint, Software, Reproducible | |
2019 | Hocking TD | Comparing namedCapture with other R packages for regular expressions | The R Journal 11(2) | Publisher, Software, Reproducible | |
2019 | Jewell SW, Hocking TD, Fearnhead P, Witten DM | Fast nonconvex deconvolution of calcium imaging data | Biostatistics 21(4) | Pubmed | |
2019 | Sievert C, VanderPlas S, Cai J, Ferris K, Khan FUF, Hocking TD | Extending ggplot2 for Linked and Animated Web Graphics | Journal of Computational and Graphical Statistics 28(2) | Publisher, Software, Reproducible, Interactive Figures | |
2018 | Alirezaie N, Kernohan KD, Hartley T, Majewski J, Hocking TD | ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants | The American Journal of Human Genetics 103(4) | DOI, Software, used in dbNSFP | |
2018 | Depuydt P, Boeva V, Hocking TD, Cannoodt R, Ambros IM, Ambros PF, Asgharzadeh S, Attiyeh EF, Combaret Vé, Defferrari R, Fischer M, Hero B, Hogarty MD, Irwin MS, Koster J, Kreissman S, Ladenstein R, Lapouble E, Laureys Gè, London WB, Mazzocco K, Nakagawara A, Noguera R, Ohira M, Park JR, Pötschger U, Theissen J, Tonini GP, Valteau-Couanet D, Varesio L, Versteeg R, Speleman F, Maris JM, Schleiermacher G, Preter KD | Genomic amplifications and distal 6q loss: novel markers for poor survival in high-risk neuroblastoma patients | JNCI: Journal of the National Cancer Institute 110(10) | Publisher | |
2018 | Depuydt P, Koster J, Boeva V, Hocking TD, Speleman F, Schleiermacher G, De Preter K | Meta-mining of copy number profiles of high-risk neuroblastoma tumors | Scientific data 5(1) | Publisher | |
2017 | Drouin A, Hocking T, Laviolette F | Maximum Margin Interval Trees | Advances in Neural Information Processing Systems 30 | Publisher, Software, Reproducible, Preprint, Video | |
2017 | Hocking TD, Goerner-Potvin P, Morin A, Shao X, Pastinen T, Bourque G | Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning | Bioinformatics 33(4) | Pubmed, Software, Reproducible, Data | |
2017 | Maidstone R, Hocking T, Rigaill G, Fearnhead P | On optimal multiple changepoint algorithms for large data | Statistics and Computing 27 | Publisher, Software, Reproducible | |
2016 | Chicard M, Boyault S, Colmet Daage L, Richer W, Gentien D, Pierron G, Lapouble E, Bellini A, Clement N, Iacono I, Bréjon Sé, Carrere M, Reyes Cé, Hocking T, Bernard V, Peuchmaur M, Corradini Nè, Faure-Conter Cé, Coze C, Plantaz D, Defachelles AS, Thebaud E, Gambart M, Millot Féé, Valteau-Couanet D, Michon J, Puisieux A, Delattre O, Combaret Vé, Schleiermacher G | Genomic Copy Number Profiling Using Circulating Free Tumor DNA Highlights Heterogeneity in Neuroblastoma | Clinical Cancer Research 22(22) | Publisher | |
2016 | Shimada K, Shimada S, Sugimoto K, Nakatochi M, Suguro M, Hirakawa A, Hocking TD, Takeuchi I, Tokunaga T, Takagi Y, Sakamoto A, Aoki T, Naoe T, Nakamura S, Hayakawa F, Seto M, Tomita A, Kiyoi H | Development and analysis of patient-derived xenograft mouse models in intravascular large B-cell lymphoma | Leukemia 30(7) | Pubmed | |
2015 | Hocking TD, Rigaill G, Bourque G | PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data | Proc. 32nd ICML | Publisher, Video, Software, Reproducible | |
2014 | Hocking TD, Boeva V, Rigaill G, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Richer W, Bourdeaut F, Suguro M, Seto M, Bach F, Vert JP | SegAnnDB: interactive Web-based genomic segmentation | Bioinformatics 30(11) | Pubmed, Software, Reproducible, Preprint, Package, Reproducible | |
2014 | Suguro M, Yoshida N, Umino A, Kato H, Tagawa H, Nakagawa M, Fukuhara N, Karnan S, Takeuchi I, Hocking TD, Arita K, Karube K, Tsuzuki S, Nakamura S, Kinoshita T, Seto M | Clonal heterogeneity of lymphoid malignancies correlates with poor prognosis | Cancer Sci 105(7) | Pubmed | |
2013 | Hocking TD, Schleiermacher G, Janoueix-Lerosey I, Boeva V, Cappo J, Delattre O, Bach F, Vert J-P | Learning smoothing models of copy number profiles using breakpoint annotations | BMC Bioinformatics 14(164) | Publisher, Software, Reproducible | |
2013 | Hocking TD, Wutzler T, Ponting K, Grosjean P | Sustainable, Extensible Documentation Generation using inlinedocs | Journal of Statistical Software 54 | Publisher, Software, Reproducible | |
2013 | Rigaill G, Hocking T, Vert J-P, Bach F | Learning sparse penalties for change-point detection using max margin interval regression | Proc. 30th ICML | Publisher, Video, Software, Reproducible | |
2012 | Hocking TD | Learning algorithms and statistical software, with applications to bioinformatics | PHD thesis, Ecole normale supérieure de Cachan | Publisher, Reproducible | |
2011 | Hocking TD, Joulin A, Bach F, Vert J-P | Clusterpath an algorithm for clustering using convex fusion penalties | 28th international conference on machine learning | Publisher, Video, Software, Reproducible, Cited in Ihaka lecture | |
2010 | Gautier M, Hocking TD, Foulley J-L | A Bayesian outlier criterion to detect SNPs under selection in large data sets | PLoS one 5(8) | Publisher | |
2008 | Doyon Y, McCammon JM, Miller JC, Faraji F, Ngo C, Katibah GE, Amora R, Hocking TD, Zhang L, Rebar EJ, Gregory PD, Urnov FD, Amacher SL | Heritable targeted gene disruption in zebrafish using designed zinc-finger nucleases | Nature biotechnology 26(6) | Pubmed |
The output above is a table with one row per publication, and an image column that shows a figure from the paper. The trick to getting that to display, is putting it in this repo, with a standard name, based on the bib file key.
The code below checks for missing figures.
some.out[Figure=='', Title]
## [1] "Assessment of the Climate Trace global powerplant CO2 emissions"
## [2] "Predicting forest burn from satellite image data"
## [3] "Predicting Neuromuscular Engagement to Improve Gait Training With a Robotic Ankle Exoskeleton"
## [4] "Insect Identification in Pulsed Lidar Images Using Changepoint Detection Algorithms"
## [5] "Survival Regression with Accelerated Failure Time Model in XGBoost"
## [6] "Classifying Imbalanced Data with AUM Loss"
## [7] "Graph embedding: A methodological survey"
## [8] "Chatbots Language Design: The Influence of Language Variation on User Experience with Tourist Assistant Chatbots"
## [9] "Interpretable linear models for predicting security vulnerabilities in source code"
## [10] "Linear time dynamic programming for computing breakpoints in the regularization path of models selected from a finite set"
## [11] "Fuzz Testing the Compiled Code in R Packages"
## [12] "A Graph-Constrained Changepoint Learning Approach for Automatic QRS-Complex Detection"
## [13] "A Graph-constrained Changepoint Detection Approach for ECG Segmentation"
## [14] "Extending ggplot2 for Linked and Animated Web Graphics"
## [15] "Development and analysis of patient-derived xenograft mouse models in intravascular large B-cell lymphoma"
Make sure pdflatex likes it
This part only works in Rmd, not md/jekyll for some reason.
Report mis-match between image files and refs
img.dt <- data.table(ref=sub(".png", "", dir("../assets/img/publications/")))
img.dt[!refs.wide,on="ref"] #images without bib entries
## ref
## <char>
## 1: thumb
refs.wide[!img.dt,.(ref),on="ref"] #bib entries without images
## ref
## <char>
## 1: Barnwal2022jcgs
## 2: Chaves2022chatbot
## 3: Harshe2023exoskeleton
## 4: Shimada2016leukemia
## 5: Sievert2019jcgs
## 6: Vargovich2022breakpoints
## 7: Fotoohinasab2020automaticQRSdetectionAsilomar
## 8: Fotoohinasab2020segmentationEMBC
## 9: Kolla2021fuzz
## 10: Sweeney2023insect
## 11: barr2022classifying
## 12: barr2022graph
## 13: hocking2022interpretable
## 14: Gurney2024power
## 15: Thibault2024forest
Conclusion
We have seen how a bib file can be used to define a publications web page.
- bib file contains peer-reviewed, published papers as
article
(for journals) andinproceedings
(for conferences), incollection
is used for book chapters,phdthesis
is used for PHD thesis,
unpublished
is used for papers in progress (not yet published), meaning one or more of below:
- conference tutorial, note should include which conference; links should include conference page, book/manual page, and github source.
- paper not yet submitted for review (note=In progress),
- Pre-print available, put in note and links.
- If under review or accepted, put venue in note.
Because bibtex ignores fields like links
which are not part of
standard bibliography types, we can put markdown code in there, and
then put it into our markdown pubs page.
Session info
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
## [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics utils datasets grDevices methods base
##
## other attached packages:
## [1] data.table_1.15.99
##
## loaded via a namespace (and not attached):
## [1] compiler_4.4.1 nc_2024.8.15 tools_4.4.1 knitr_1.47 xfun_0.45 evaluate_0.23