Quickly reading very large tables as dataframes GoTeck

Quickly reading very large tables as dataframes

Asked 07 September, 2021

Viewed 2.7K times

r import dataframe r-faq

Votes

I have very large tables (30 million rows) that I would like to load as a dataframes in R. read.table() has a lot of convenient features, but it seems like there is a lot of logic in the implementation that would slow things down. In my case, I am assuming I know the types of the columns ahead of time, the table does not contain any column headers or row names, and does not have any pathological characters that I have to worry about.

I know that reading in a table as a list using scan() can be quite fast, e.g.:

datalist <- scan('myfile',sep='	',list(url='',popularity=0,mintime=0,maxtime=0)))

But some of my attempts to convert this to a dataframe appear to decrease the performance of the above by a factor of 6:

df <- as.data.frame(scan('myfile',sep='	',list(url='',popularity=0,mintime=0,maxtime=0))))

Is there a better way of doing this? Or quite possibly completely different approach to the problem?

11 Answer

Order by

Quickly reading very large tables as dataframes

11 Answer

Number Achievement

Recent Questions

Trending Questions

Trending Tags

Advertisements

Quickly reading very large tables as dataframes

11 Answer

Number Achievement

Recent Questions

Trending Questions

Trending Tags

Advertisements

Login with Social Media