1. From your Dremio Home Screen, click on your AWS S3 Storage Connection labeled “DevDay” on the button on the bottom left corner of your screen. In the top center portion of your screen you will see a folder titled “dremio-data-lake-[AccountID]”. Click on that folder then click on the “Trips” Folder. You should now see Parquet files.
  2. In Dremio you can convert a folder of files into a single Physical Dataset (PDS). Navigate your cursor to the top right of the screen and click on the icon to turn the folder of Parquet files into a PDS.
    A pop-up window will appear. Dremio recognizes specific formats and will structure the table accordingly. For this table Dremio recognized that the folder it was combining was full or Parquet files. You do not need to do anything further on this page, click “Save”.
  3. You are now previewing the Physical Data Set in the Dremio Preview Window. Notice the purple folder denoting a PDS in the top left corner of the screen.


  4. Let’s explore the data set. How many records are there? Type SELECT count(*) FROM trips into the SQL Runner and press Run.
    There are 862,736,805 records in this data set.
  5. You are now previewing the Physical Data Set in the Dremio Preview Window. Notice the purple folder denoting a PDS in the top left corner of the screen.
  6. Let’s explore the data set. How many records are there?
    Type SELECT count(*) FROM trips into the SQL Editor and press Run.
    There are 862,736,805 records in this data set. Navigate back on your browser to return to the screen with your original query (SELECT (*) FROM trips)
    Next we are going to save this table as a Virtual Dataset (VDS) in our Dremio Space. Click the “Save View” button in the top right corner of your screen and save the VDS as NYC_Taxi to the space you created for this lab (in the save screen, this will appear underneath your personal folders).