Casting tables with `dcast` and `rowid`

4 minute read

In many cases we need to reshape the tables we are working with, that is, change their format from wide to long (melt) and vice versa (cast). It’s usually a trivial operation but sometimes it requires a little more thinking. Let’s see a simplified example of a real problem.

Data generation

A totally random table with 50 rows is generated with the following columns:

id: Random numbers between 1 and 30 (with replacement).
phone: Random numbers between 600000000 and 699999999 (without replacement).
scoring: Random decimal numbers between 1 and 10 (without replacement).
pwd: Random strings with length 8.

The idea behind this data generation is to get an unknown number of rows with the same id with an unknown number of repetitions.

# Load packages
library(data.table)
library(knitr)

# Fixing the seed for reproducible results
set.seed(13)

# Random table generation
myDT <- data.table(
  id = sample(1:30, 50, replace = T),
  phone = sample(600000000:699999999, 50),
  scoring = runif(50, 1, 10),
  pwd = sapply(
    1:50,
    FUN = function(x) paste0(sample(letters, 8, replace = T), collapse = '')
    )
  )

# Order by id
setorder(myDT, id)

# Show data
kable(myDT, caption = 'Random Table', format = 'html')

Random Table
id	phone	scoring	pwd
1	645403893	1.020973	klcapxnp
1	619413048	2.683202	iohatbab
2	609713628	7.300181	yudqfjmg
2	641256896	6.117944	tuimsqil
3	699784180	6.601338	daymvjqa
3	640456925	4.971228	jcvgzbuj
3	699936658	2.814284	bkkutjau
4	659143790	9.478907	xvdzpylw
5	627234317	1.931959	fmpcamnx
5	660916649	1.253950	dctpbtam
7	693736060	2.075693	hnpuolaj
8	639782493	6.433026	rwmwqvdu
10	619850570	8.600371	jzdkusva
10	680929713	6.954972	zeqxntgc
11	645335202	2.682323	lojmtrvs
11	629926642	9.379491	cxqadrdj
12	607223895	7.975297	avxympqg
13	653996293	2.607843	nnuqxigs
13	645038697	8.760344	mcwduuiw
14	646393146	1.607134	lkowclpa
14	694678826	8.862866	bxhelfjs
16	616062260	6.175765	ncukcyzz
17	662680472	5.231123	mkdaflur
17	629480466	2.209104	zdnpwuia
17	671447129	8.875030	oyftrcbp
18	614526543	3.632207	uijzspbs
18	635067768	8.161684	owrevrzb
18	608546595	5.084333	afrcytbl
19	693548878	1.149800	zoqoetuz
19	623388830	2.247059	kthwcoef
19	692795692	7.412554	juaelcsd
20	649672445	4.446254	liutuwqf
20	664961449	9.614372	rnrndysi
20	602632286	3.724153	rqidgdol
21	642990005	9.319300	ytcojema
21	636961722	6.555269	tnsletpt
21	684581198	7.147596	cqnoitbj
22	655936305	6.468607	diypikrs
22	692048681	5.884008	towijsgw
23	634100247	2.145143	ovkziixa
23	631965985	4.648430	owyiszmh
25	642384037	2.570212	yocaenac
26	639233975	7.821415	jabxdxab
27	623326281	6.333832	rcmnymfb
27	657307628	1.835592	ljxframj
27	680172458	2.802887	mxcokckt
28	694408637	3.547176	wsxxhocd
28	632219643	8.953089	xhnvzwae
29	602132896	4.271840	hycucpcz
30	615307379	5.279515	dvhbrnim

Change to wide format:

The goal is to have just one row per id without losing information.

The difficulty here is that the number of columns to add is not constant, varying according to the number of occurrences of each identifier. Of course, it is a problem that can be solved in many ways, especially with such a small table like this. The idea is to show how thanks to the power of R it can be solved in a general way in just one command call:

# Generate wide format table grouping by unique id
df <- dcast(
  myDT,
  formula = id ~ rowid(id),
  value.var = c('phone', 'scoring', 'pwd')
  )

# Show new table
kable(df, caption = 'Wide Format Table', format = 'html')

Wide Format Table
id	phone_1	phone_2	phone_3	scoring_1	scoring_2	scoring_3	pwd_1	pwd_2	pwd_3
1	645403893	619413048	NA	1.020973	2.683202	NA	klcapxnp	iohatbab	NA
2	609713628	641256896	NA	7.300181	6.117944	NA	yudqfjmg	tuimsqil	NA
3	699784180	640456925	699936658	6.601338	4.971228	2.814284	daymvjqa	jcvgzbuj	bkkutjau
4	659143790	NA	NA	9.478907	NA	NA	xvdzpylw	NA	NA
5	627234317	660916649	NA	1.931959	1.253950	NA	fmpcamnx	dctpbtam	NA
7	693736060	NA	NA	2.075693	NA	NA	hnpuolaj	NA	NA
8	639782493	NA	NA	6.433026	NA	NA	rwmwqvdu	NA	NA
10	619850570	680929713	NA	8.600371	6.954972	NA	jzdkusva	zeqxntgc	NA
11	645335202	629926642	NA	2.682323	9.379491	NA	lojmtrvs	cxqadrdj	NA
12	607223895	NA	NA	7.975297	NA	NA	avxympqg	NA	NA
13	653996293	645038697	NA	2.607843	8.760344	NA	nnuqxigs	mcwduuiw	NA
14	646393146	694678826	NA	1.607134	8.862866	NA	lkowclpa	bxhelfjs	NA
16	616062260	NA	NA	6.175765	NA	NA	ncukcyzz	NA	NA
17	662680472	629480466	671447129	5.231123	2.209104	8.875030	mkdaflur	zdnpwuia	oyftrcbp
18	614526543	635067768	608546595	3.632207	8.161684	5.084333	uijzspbs	owrevrzb	afrcytbl
19	693548878	623388830	692795692	1.149800	2.247059	7.412554	zoqoetuz	kthwcoef	juaelcsd
20	649672445	664961449	602632286	4.446254	9.614372	3.724153	liutuwqf	rnrndysi	rqidgdol
21	642990005	636961722	684581198	9.319300	6.555269	7.147596	ytcojema	tnsletpt	cqnoitbj
22	655936305	692048681	NA	6.468607	5.884008	NA	diypikrs	towijsgw	NA
23	634100247	631965985	NA	2.145143	4.648430	NA	ovkziixa	owyiszmh	NA
25	642384037	NA	NA	2.570212	NA	NA	yocaenac	NA	NA
26	639233975	NA	NA	7.821415	NA	NA	jabxdxab	NA	NA
27	623326281	657307628	680172458	6.333832	1.835592	2.802887	rcmnymfb	ljxframj	mxcokckt
28	694408637	632219643	NA	3.547176	8.953089	NA	wsxxhocd	xhnvzwae	NA
29	602132896	NA	NA	4.271840	NA	NA	hycucpcz	NA	NA
30	615307379	NA	NA	5.279515	NA	NA	dvhbrnim	NA	NA

Using dcast with the collaboration of rowid casts the table grouping by id and forces uniqueness without the data being aggregated.

I don’t know if there is an easy way to achieve this with traditional tools like SQL or a spreadsheet, but I’m afraid that a little more work would be involved.

Share on

Twitter Facebook Google+ LinkedIn

Casting tables with `dcast` and `rowid`

Data generation

Change to wide format:

Share on

Leave a comment

You may also enjoy

One Year Of Cubing

Dataviz: My Reads (2017)

Dataviz: My Reads (2016)