4 minute read
Nuffield Department Of Population Health, University Of Oxford
Daily Life
My hours were 9am-5pm Monday to Friday. I walked to the museum every morning. Every
morning I started working on the collection from wherever I left off the night before and
completed any tasks I’d noted on my morning to-do-list. I was living with people from my
college so spent my evenings with them, or a few nights I went out to the pub with the other
interns.
Lasting Impressions
I thoroughly enjoyed my experience and it helped me solidify my career plan for the next few
years. I gained a lot of transferable skills and skills directly relevant to what I want to do in the
future. I also learnt a lot that will be useful for my degree.
NUFFIELD DEPARTMENT OF POPULATION HEALTH,
UNIVERSITY OF OXFORD
Yutong Dai, Wadham College, MMathCompSci Mathematics and Computer
Science, First Year Undergraduate, in-person working
Work Projects
I was tasked with improving the rapid reporting
tool used by the Clinical Trial Service Unit of the
Nuffield Department of Population Health to
enable 1) extracting and transforming data from
multiple databases; 2) generating reports with
graphics; 3) running interactive reports; and 4) incorporating scripts into data transformation.
The old reporting solution was a set of Bash scripts that feeds SQL queries into database
engines and dumps the temporary table storing the result of the transformation into a CSV file,
which is then converted into XML and styled using custom style sheets. The old solution was an
ad-hoc response to the reporting requirements of clinical trial management about 10 years ago
and gradually became inadequate. I determined that it was pointless to reinvent the wheel, so
the old solution should be deprecated in favour of modern ETL (Extract-Transform-Load) and
reporting software.
I researched a total of 14 ETL and reporting
solutions, comparing them according to their
operating system support, functionalities, and
extensibility, and narrowed down to two candidates.
Then I implemented an old report using each of the
two tools, finding one of them, KNIME (including
BIRT), to be significantly superior, and decided to use it. I created a total of six reports using
KNIME. The first one was just mentioned. Among the remaining five, some were
demonstrations of functionalities of KNIME, while others were new reports requested by the
investigators. The reports incorporated advanced data transformations that were not
achievable with SQL such as Python scripting and creating sophisticated charts like bar charts
with confidence intervals. I also used some R to draw more advanced graphics such as a bubble
chart superimposed on a map of the UK.
To satisfy the requirements of interactive reports, I wrote an R Shiny application that supports
simple filtering of a data table by column value. In view of the need to periodically generate the
reports on the departmental server, I installed the relevant software on the server and
navigated through documentation to figure out how to run everything on the command line.
Finally, I wrote a 46-page report documenting my work, especially on how to use the software I
recommended, so that future developers can quickly adapt to it.
At the end of my internship, I delivered a 45-minute talk in front of the team. During my
internship, my supervisor Sonja helped me understand the old reporting solution and its
shortcomings, explained the requests for new reports from the investigators, and pointed out
the meanings of and relations between the columns in different database tables. She also
arranged my access to the databases and supported me in my requests to NDPH IT regarding
computer configuration and software installation. My other supervisors Karl and Allen
introduced me to their work and the data protection practices of the department.
Daily Life
The internship was based at the Richard Doll Building on the Old Road Campus, a 20-minute
bike ride from my second-year college accommodation. Since the Old Road Campus is situated
on a hill, the bike journey to the workplace was difficult, while the journey back required a lot
of braking. Settling in simply meant getting the office keys, being granted access to the
buildings, and setting up my office desktop. The internship had flexible working hours, which
means that I arrived in the office between 9am and 10:30am; similarly I left the office between
4:30pm and 6pm.
My working time was divided between
understanding the requirements through
conversation with my supervisors,
translating SQL statements to KNIME
nodes, designing reports with BIRT
Report Designer, writing code in Python
and R, automating everything on the server, and documenting what I did. In the evenings and
weekends, I cycled to the Main Site of the college to hang out with friends who were staying in
Oxford for various reasons. I also previewed some of my second-year course materials and
started learning Linux. I travelled to London on a Monday (utilizing one of the four days of paid
leave) and to Bristol on a Sunday. Allen invited the team members to pubs in Headington twice
-- once for general socialization, and the other for the final day of my internship.
Lasting Impressions
Overall, I enjoyed my internship. I liked translating SQL into well-organized, well-documented,
concise KNIME workflows and designing sophisticated reports in BIRT. I enjoyed contributing to