4 minute read

Nuffield Department Of Population Health, University Of Oxford

Daily Life

My hours were 9am-5pm Monday to Friday. I walked to the museum every morning. Every

morning I started working on the collection from wherever I left off the night before and

completed any tasks I’d noted on my morning to-do-list. I was living with people from my

college so spent my evenings with them, or a few nights I went out to the pub with the other

interns.

Lasting Impressions

I thoroughly enjoyed my experience and it helped me solidify my career plan for the next few

years. I gained a lot of transferable skills and skills directly relevant to what I want to do in the

future. I also learnt a lot that will be useful for my degree.

NUFFIELD DEPARTMENT OF POPULATION HEALTH,

UNIVERSITY OF OXFORD

Yutong Dai, Wadham College, MMathCompSci Mathematics and Computer

Science, First Year Undergraduate, in-person working

Work Projects

I was tasked with improving the rapid reporting

tool used by the Clinical Trial Service Unit of the

Nuffield Department of Population Health to

enable 1) extracting and transforming data from

multiple databases; 2) generating reports with

graphics; 3) running interactive reports; and 4) incorporating scripts into data transformation.

The old reporting solution was a set of Bash scripts that feeds SQL queries into database

engines and dumps the temporary table storing the result of the transformation into a CSV file,

which is then converted into XML and styled using custom style sheets. The old solution was an

ad-hoc response to the reporting requirements of clinical trial management about 10 years ago

and gradually became inadequate. I determined that it was pointless to reinvent the wheel, so

the old solution should be deprecated in favour of modern ETL (Extract-Transform-Load) and

reporting software.

I researched a total of 14 ETL and reporting

solutions, comparing them according to their

operating system support, functionalities, and

extensibility, and narrowed down to two candidates.

Then I implemented an old report using each of the

two tools, finding one of them, KNIME (including

BIRT), to be significantly superior, and decided to use it. I created a total of six reports using

KNIME. The first one was just mentioned. Among the remaining five, some were

demonstrations of functionalities of KNIME, while others were new reports requested by the

investigators. The reports incorporated advanced data transformations that were not

achievable with SQL such as Python scripting and creating sophisticated charts like bar charts

with confidence intervals. I also used some R to draw more advanced graphics such as a bubble

chart superimposed on a map of the UK.

To satisfy the requirements of interactive reports, I wrote an R Shiny application that supports

simple filtering of a data table by column value. In view of the need to periodically generate the

reports on the departmental server, I installed the relevant software on the server and

navigated through documentation to figure out how to run everything on the command line.

Finally, I wrote a 46-page report documenting my work, especially on how to use the software I

recommended, so that future developers can quickly adapt to it.

At the end of my internship, I delivered a 45-minute talk in front of the team. During my

internship, my supervisor Sonja helped me understand the old reporting solution and its

shortcomings, explained the requests for new reports from the investigators, and pointed out

the meanings of and relations between the columns in different database tables. She also

arranged my access to the databases and supported me in my requests to NDPH IT regarding

computer configuration and software installation. My other supervisors Karl and Allen

introduced me to their work and the data protection practices of the department.

Daily Life

The internship was based at the Richard Doll Building on the Old Road Campus, a 20-minute

bike ride from my second-year college accommodation. Since the Old Road Campus is situated

on a hill, the bike journey to the workplace was difficult, while the journey back required a lot

of braking. Settling in simply meant getting the office keys, being granted access to the

buildings, and setting up my office desktop. The internship had flexible working hours, which

means that I arrived in the office between 9am and 10:30am; similarly I left the office between

4:30pm and 6pm.

My working time was divided between

understanding the requirements through

conversation with my supervisors,

translating SQL statements to KNIME

nodes, designing reports with BIRT

Report Designer, writing code in Python

and R, automating everything on the server, and documenting what I did. In the evenings and

weekends, I cycled to the Main Site of the college to hang out with friends who were staying in

Oxford for various reasons. I also previewed some of my second-year course materials and

started learning Linux. I travelled to London on a Monday (utilizing one of the four days of paid

leave) and to Bristol on a Sunday. Allen invited the team members to pubs in Headington twice

-- once for general socialization, and the other for the final day of my internship.

Lasting Impressions

Overall, I enjoyed my internship. I liked translating SQL into well-organized, well-documented,

concise KNIME workflows and designing sophisticated reports in BIRT. I enjoyed contributing to

This article is from: