Using your workspace

DataLab

Getting started, accessing your data files, available software, locking your workspace and signing out

Released
4/11/2021

Getting started in the DataLab workspace

When you have successfully logged into your Virtual Machine (VM) instance, your DataLab workspace looks like this:

DataLab workspace

You can use DataLab in a similar way to using other secure networked systems, where you can securely see, use and share data files, analysis and output with the other members of your project team.

Open File Explorer and click on This PC to see the network drives you have access to:

  • Library: All researchers can see all files in the Library drive. This is where we upload support information, such as statistical language documentation, ANZSIC classification and general access guides for non-standard products. Files cannot be saved to this drive.
  • Output: Any output you want the ABS to clear should be saved to this drive. Only members of your team can see this drive. See also Request output clearance. Information is backed up nightly and retained for 14 days. Information in this folder remains unaffected by a rebuild.
  • Project: A shared space for your team to work in and store all your project files, as well as set up and run Python and R scripts. Only members of your team can see this drive. Information is backed up nightly and retained for 14 days. Information in this folder remains unaffected by a rebuild. The default storage is 1TB. You will need to review and delete unnecessary files as your project files grow over time. If necessary, an increase to this storage can be requested via the Contact us page. There may be a cost for additional storage.
  • Products: Access data files that have been approved for your project. However, it is best to use the My data products shortcut on your desktop as this shows you only the datasets you have been approved to access, rather than all dataset short names. Files cannot be saved to this drive.
  • LocalDisk: If you have been granted local disk space, this can be used to run jobs on offline virtual machines (desktops). You may want to request this option if you have multiple projects that you are actively involved in. There may be a cost associated with attaching local disk space to your VM. The local disk will only be present if it has been allocated to your VM. To request local disk space contact the ABS via the Contact us page
  • Drive C can be utilised to run scripts and create new Python package virtual environments, not facilitated through either Jupyter Notebook, JupyterLab or Spyder. Noting the C drive is also destroyed with each 30 day rebuild. Note: Avoid using this drive for saving files - there is limited space and no ability to increase the storage capacity. If more storage is required a local disk can be requested for your VM. 
  • ​​​​​​Drives A and D are not to be used. Information saved here is either destroyed with each nightly shutdown and 30 day rebuild, or has restricted access. Attempting to read or write from Drives A or D will invoke a group policy error due to access controls. In this case please use the C drive or consult your project lead to request local disk space.
Network drives you have access to

Do not store files in any other folders. Other members of your project cannot see files if you store them in other drives. Files stored outside of the Project and Output drives are destroyed every 30 days as part of DataLab security protocols.

Refreshing your network drives. If your network drives do not appear in File Explorer, you can click the 'Refresh Network' shortcut on the desktop. A confirmation message appears when this has been successfully refreshed.

Refreshing your network drives

Accessing your data files

To access the data files for your project, use the 'My Data Products' shortcut on your desktop.

My Data Products shortcut

The My Data Products folder displays only the products approved for your project.

My Data Products folder

Selecting the 'Products' drive shows you the short name of all data loaded to the DataLab. However, if you try to open a file that is not approved for your project you are denied access and receive an error.

Products drive
Error message when accessing a file that is not approved for your project

Available software

Software can be opened using the shortcuts on your desktop or by using search on the Taskbar.

All researchers have access to these applications in the DataLab:

  • LibreOffice 7.5
  • Acrobat Reader
  • Notepad ++ 8.5.2
  • QGIS 3.30
  • WinMerge 2.16
  • Git 2.40
  • Stata MP18
  • CUDA 12.1.1
  • R 4.1.3. including:
    • RStudio 2023.03
    • RTools 42
  • Python 3.9 (Anaconda3 distribution) including:
    • Jupyter Notebook & JupyterLab
    • Spyder
  • PostgreSQL 15
  • 7Zip

If required, you can also request:

  • SAS 9.4
  • Azure Databricks

Microsoft Word and Excel are not currently available, as these applications require an internet connection, which is not supported in a secure system like DataLab. Libreoffice is the alternative offered in the system, with similar capabilities to Microsoft Office.

Firefox and Edge are available to support access for Databricks (which is under development) and for Jupyter notebooks to use Python/R. These browsers cannot be used to browse the internet.

If a package in your statistical software choice is not available, you can request it using the Contact us page. 

Managing your R & Python packages explains how you can manage R and Python packages using the Posit Package Manager shortcut on your desktop.

Databricks

Databricks is available to projects within the DataLab as a non-standard product. 

What is Databricks? 

Databricks is a cloud-based Big Data processing platform which provides users with an integrated environment to collaborate on projects and offers a range of tools for data exploration, visualisation and analysis. Within the Databricks environment, users can:

  • Build pipelines for streaming data processing.
  • Build and run machine learning tools.
  • Create interactive dashboards.
  • Take advantage of scalable distributed computing capability.

Project analysts will also have access to the Databricks Academy training subscription (an online library of Databricks training guides), in addition to instruction materials on how to setup the Databricks workspace provided in the ABS shared library. 

How do I allocate a Databricks workspace to my project? 

To allocate a Databricks workspace to your project, you will need to submit a request to data.services@abs.gov.au. Once your project is allocated a Databricks workspace, it can be accessed from within your VM using the installed Edge or Firefox browsers.

How will costing work? 

Access to Databricks will be charged per project on a financial year basis. Projects will have the flexibility to select between a low or high usage profile, each with its own charging profile. Selecting the appropriate usage profile is determined by how much compute resources project analysts are estimated to consume. The same level of service is applicable across both profiles. 

Usage will be monitored by the ABS and project leads will be advised if their usage is projected to exceed the charges paid. Should projects exceed the usage of their profile within the financial year, access to this service can be ceased or continue subject to additional charges. Please see our DataLab charges for more information. 

As Databricks uses separate compute power, projects requesting access to Databricks should consider if they need to continue to maintain their existing VM sizes. The option of scaling down the size of existing VMs provides users the opportunity to save on project costs. 

What are the cluster policy arrangements? 

User analysts can be provisioned with the following cluster policy options: 

Instance: DS3 v2

  • Server purpose: General purpose
  • Max autoscale workers: 5
  • CPU: 4
  • RAM/Databricks Units: 14GB/0.75

Instance: D13 v2

  • Server purpose: Memory optimized 
  • Max autoscale workers: 4
  • CPU: 8
  • RAM/Databricks Units: 56GB/2

Instance: DS3 v2

  • Server purpose: Compute optimized 
  • Max autoscale workers: 4
  • CPU: 16
  • RAM/Databricks Units: 32GB/3

Databricks cluster policies will restrict the type and number of workers you can provision for a cluster. If an existing policy does not fit your requirements, you can request a new policy via the ABS. All information regarding this can be found in the ABS shared library.

To ensure the security and integrity of the DataLab, clients will not have administrative access to the Databricks workspace and some usage restrictions may apply. Administration will be exclusively managed by the ABS, aligning with the specified usage restrictions of the DataLab. 

Please contact data.services@abs.gov.au with any questions.

Managing your R & Python packages

If you are working with a specific set of R and/or Python packages, you can now manage these using the Package Manager shortcut on your desktop.

Posit package manager shortcut

In the Package Manager, click 'Get Started' to navigate to the available packages. You can use this tool to search for packages (in the left column) and install the packages you want to use for your project. If the packages you need are not listed, you can request them using the Contact us page.

Posit Package Manager page where you can check your available packages by clicking Get Started
List of the available packages in your Posit package manager

Virtual machines

What are virtual machines?

Virtual machines, or VMs, are the virtual workspaces you use to undertake your analysis in the DataLab. VMs are created by the ABS as part of the project establishment process, described in About DataLab.

You have one VM for each project. This is a design feature to prevent data from one project being accessed by another project. You can run analysis on multiple virtual machines at the same time, but only if you have been granted local disk space. See Run jobs on offline VMs (desktops). You may want to request this option if you have multiple projects that you are actively involved in.

Virtual machine sizes

The ABS offers standard and non-standard VM sizes. Standard VMs are included in the DataLab annual fee, whereas non-standard VMs are subject to additional charges as they are more expensive to run. For more information on charges, see DataLab charges.

Researchers may request access to a non-standard machine for performance or productivity purposes. If you require a non-standard machine, you will need to consult your project lead and your project lead will need to send the ABS an updated project proposal.

Currently offered VMs and approximate running costs are listed in the tables below.

 
Standard virtual machines

Large VMs are provided as the default and most projects operate efficiently with this size.

If you have a small or medium machine, it can be upgraded to a large at no additional charge. Please contact data.services@abs.gov.au for further assistance.

Standard Virtual Machines
NameCPU CoresRAMApprox cost per hour ($AUD)
Small28GBNot applicable - these virtual machines are included in the DataLab annual fee.
Medium216GB
Large864GB

 

Non-standard virtual machines

Non-standard machines are available on request. Additional charges will apply, please refer to DataLab charges for more information.

Non-standard Virtual Machines
NameCPU CoresRAMApprox cost per hour ($AUD)
X-Large16128GB$1.80
XX-Large32256GB$3.80
XXX-Large64504GB$6.40

 

Specialised and custom non-standard virtual machines

The following specialised VMs (also non-standard) capable of supporting machine learning and high-performance computing can also be requested, however these are assessed on a case by case basis with the appropriate justification, and are subject to quote. If the required VM is not listed, the ABS may be able to provide a customised option at an additional charge, please be sure to describe why the available machines do not meet your needs in any justification provided. A list of virtual machines by region can be viewed via the Azure website.

Assigned names of VMs are unrelated to Azure naming conventions. ABS review our provided VM options periodically, please revisit this page for any updates.

Specialised and Custom Non-standard Virtual Machines
NameCPU CoresRAMGPUApprox per hour cost ($AUD)
Large GPU856GBTesla T4 16GB $1.50
X-Large GPU16110GBTesla T4 16GB$2.40
M-series1282000GBNot applicable$28.80

 

Sign out or Lock your DataLab session

When you walk away from your computer or are finished with your DataLab session, you must either lock your workstation or sign out of your account to ensure nobody else accesses your DataLab account.

To lock or sign out of your workspace, click the menu button at the top of your window to expand the toolbar, then select 'Ctrl+Alt+Del' to be presented with the options to lock or sign out of your workspace. 

Workspace toolbar expander
Ctrl, alt, delete menu option
Lock menu option

If you need to leave your computer for a short length of time or you have analysis running, lock your DataLab screen. You can then close your VM window by using the X in the top right- hand corner. This closes your session but does not end any programs you have running.  Your programs will continue to run until 10pm that night, or longer if you have selected the Bypass option in the portal.

'X' button to exit workspace
Desktop viewer alert received when exiting workspace

Sign out to leave your workspace session. This closes your session and will end any programs you have running. 

If you are using Citrix workspace portal you may still be logged into the browser. You can either close this window or Log Out of the portal using the icon in the top right corner (with your initial). To log back in, see Logging into the portal and workspace.

Citrix Workspace portal screen where you can close the browser window or Log Out of the portal
Back to top of the page