Control Sheet Cosylab’s Newsletter Volume 25
ISSN: 1855-9255
December 2015 Table of Contents Solaris Shining!
2
On September 21, the Solaris Synchrotron was officially inaugurated. We take a look at the project.
OpenXAL: Scientist meets Software Engineer
3 5 8 9
A reflection on the fruitful cross-disciplinary cooperation on the OpenXAL project.
15:25
15:30
15:35
Now
Live Data Contains No data less than 5 minutes old, can be transferred / deleted
Contains data less than 5 minutes old
Incomplete partition
Fast Data Storage System for PAL-XFEL Cosylab has developed a fast data storage and archiving system for the PAL-XFEL project.
The Picture Board ICALEPCS 2015
New Year 2016
Cosylab d.d., Teslova ulica 30, SI-1000 Ljubljana, SLOVENIA Phone: +386 1 477 66 76 Email: controlsheet@cosylab.com URL: www.cosylab.com
page:
Control Sheet
Volume 25
2
ISSN: 1855-9255
Solaris Shining! By : Frank Amand (Cosylab) and Igor Dolinšek (Cosylab) More than a decade ago, Polish synchrotron radiation users formulated the ambition of a Polish national light source facility. Together with 35 research institutes and universities, they founded the Polish Synchrotron Consortium. The project, named Solaris [1], received financing from the European Structural Funds, and is based at the Jagiellonian University Campus in Kraków. Solaris is the first Polish synchrotron radiation facility. With Cosylab’s contribution to the Solaris project already having moved into the warranty support phase, Control Sheet thought it was the right time to interview Cosylab’s project leader, Igor Dolinšek, to tell us a thing or two about the project. Control Sheet: Igor, thanks for taking the
time for Control Sheet. What is the current status of the Solaris project in general? Igor: You’re welcome. On September 21st the Solaris Synchrotron was officially inaugurated by Prof. Stankiewicz in the presence of Polish Deputy Minister of Science and Higher Education Prof. Mark Ratajczak. Did you know Solaris is the largest research infrastructure built in Poland since the Maria Świerk reactor near Warsaw? Control Sheet: No we didn’t :) How large is the project then? Igor: Solaris took 5 years to build and the cost, funded by the EU, amounted to approximately 50 million Euro. The synchrotron itself has a circumference of 96 m. For now, there are two beamlines built, with at least a dozen more planned. Control Sheet: And Cosylab’s part? Igor: Cosylab provided services to build the control system (CS) part of the accelerator. It is a TANGO based system, chosen to leverage work done on the MAX-IV synchrotron in Lund, Sweden. Cosylab deployed the TANGO infrastructure at Solaris, offered support in adaptation and integration of the MAX-IV control system software, developed in Lund, to the Solaris site in Krakow. There were also hardware vendor provided CS software and physics control code from the TANGO community that needed adaptation to be integrated
Figure 1: The completed Solaris facility at the Jagiellonian University Campus in Kraków, Poland. Photo courtesy of Solaris.
into this system. We helped here as well. In addition, we developed the Sardana-based Control Systems for the two beamlines. Cosylab also designed and implemented the timing system for the machine which is an MRF-based solution. Control Sheet: Were there any Solaris specific software developments? Igor: Yes, indeed. There were Solaris-specific high-level control applications for the control room. We realized a system to dynamically generate GUIs based on a configuration, we’re quite proud of it :) We’ve shared it with the TANGO community as a customizable solution for future projects. Control Sheet: So is the work finished? Igor: We have indeed completed our du-
ties and tasks, fulfilled our contractual obligations. We were involved in early commissioning support and also timing system commissioning. The machine as a whole is completely equipped and has produced beams of 700 MeV (the LINAC is at full en-
ergy at 600 MeV and the ramp-up in the ring up to 1.5GeV is currently possible with 114 mA beam current). Control Sheet: Big projects can have surprises, were there any in this one? Igor: No, there were no major surprises, things went smoothly with a modest project delay. I’d dare to say it was a well-managed project. Control Sheet: Any notes on the collaboration with the TANGO community? Igor: The collaboration was good. We are documenting the virtual machine we created for Solaris (for easy deployment) and we will share it with the community. It can be seen as an update of the TangoBox: it’s on CentOS, has Tango 8 and latest Sardana. It was announced to the Tango community in early autumn. Control Sheet: Thank you once again Igor for your time and we wish you success with your next projects, may they be as successful as this one!
page:
Volume 25
Control Sheet
3
ISSN: 1855-9255
ABOUT THE AUTHORS Frank Amand, Belgian, joined Cosylab in 2011. Previous work experience includes 12 years with Royal Philips in Belgium and the Netherlands in a variety of software engineering related roles. His technical expertise lays in the domain of human-computer interaction, GUI design and usability. He is currently Cosylab’s Head of Marketing. Igor Dolinšek was leading Cosylab’s control system integration project for the Solaris synchrotron in Krakow. Before joining Cosylab in 2011, he worked in the software industry for 30 years. His first computer program was written in FORTRAN IV and “edited” on punched cards back in 1974. Since then he had the opportunity to work on many state-of-the art software projects of their time and hardly missed any hype in the IT industry. He has a BS degree in Computer Science and an Engineering degree in Applied Mathematics from the University of Ljubljana.
REFERENCES [1] http://www.synchrotron.uj.edu.pl/en_GB/
Figure 2: The Solaris Synchrotron
OpenXAL: Scientist meets Software Engineer By : Ivo List (Cosylab) Are the famous Maxwell equations the work of science or engineering?
Programming is not Science… Programming is engineering. Computer science gives us knowledge about the complexity of the problems we would like to solve and of the limitations that computers inherently have. However, Turing machines and Knuth’s books do not help us to attack real world problems. Why is the engineering part important? There are several prominent bugs known from history, which caused
multi-billion dollars worth of damage [1]. Probably the most infamous is Ariane 5 in 1996, where reusing older software failed due to overflow caused the rocket to crash shortly after launch. Even deaths of patients have been caused by bugs, for example, at the Therac-25 medical accelerator in 1985 and in Panama in 2000 by treatment planning software developed by Multidata Systems. Such accidents might have been prevented if the engineering experiences from software development were used. There are many techniques for finding
bugs or even preventing them before they occur. There is a long way between proving a concept or programming a personal numerical simulation to developing production quality code which will eventually run on a very expensive machine.
The Cooperation Oversimplifying, one would think there is not much need for cooperation along the way. For example, drop an e-mail with the concept code to the programmers and let them finish it and put it to
page:
Control Sheet
Volume 25
use. Just as one could describe a differential equation and give it to a mathematician to solve without explaining other circumstances, e.g. how or where the solution is going to be used. Cosylab started working on the OpenXAL project for the European Spallation Source [2] (ESS) in 2013. A big part of the OpenXAL project are envelope simulations and, at first, none of us had a good idea of how developers and physicists should cooperate. We started with simple tasks, explaining along the way why certain techniques are needed. With this understanding the trust has grown stronger and we could then attack the harder problems. The same thing happened on the other side. Explaining the circumstances and assumptions about the simulations helped us to understand and improve the model.
Achievements with OpenXAL OpenXAL [3,4] is an open source set of libraries written in Java for the creation of accelerator physics applications, scripts and services. OpenXAL is a collaboration between SNS, CSNS, ESS, GANIL, TRIUMF and FRIB. At its core is a standard accelerator framework, i.e. an XML description of the accelerator lattice (along with code for loading it from XML files,
manipulating it, EPICS access, etc.), an envelope simulation and a collection of the necessary mathematics and physics utilities. Besides these, it also includes an application framework and a large collection of applications, services and scripts written in Jython and JRuby. We methodically compared small parts of OpenXAL with other available beam simulation codes. Additionally, input parameters were varied over the whole range to identify edge cases. Differences emerged where there should be none and were sometimes resolved in OpenXAL, but other times were a result of flaws in other simulation codes. So, the work has also indirectly improved other beam simulation models and code. Automated tests were made. These add robustness and prevent the same problems from recurring in the future. The tests cover small parts of the model as well as the whole lattice, comparing the results with either previous simulations or simulations made with other tools. OpenXAL’s model was extended with fieldmaps simulations, which have been developed from scratch. We had to collaborate with ESS physicists in order to write down equations and to polish them to be useful for numerical simulations. The results were again compared
REFERENCES [1] https://www.sundoginteractive.com/blog/top-ten-mostinfamous-software-bugs-of-all-time/ [2] https://europeanspallationsource.se/ [3] http://xaldev.sourceforge.net/ [4] https://ess-ics.atlassian.net/wiki/display/SCA/OpenXAL [5] http://ipython.org/
4
ISSN: 1855-9255
with other beam simulation codes. Besides development of beam envelope code we additionally helped making the use of OpenXAL easier for the users. Since they are more familiar with Python instead of Java we are working on integrating Python into OpenXAL with an IPython front-end [5] for simple and unambiguous scripts. The scripting environment is powerful, but also easyto-use and hence reduces the learning curve. Further development on the model will include extensions for the non-linear elements and more benchmarking with other beam simulation models to understand differences. Also, to confirm correctness of the model, tests are being done on other similar machines. Ultimately, the goal is to provide a stable and robust implementation for ESS.
Conclusions What does this have to do with Maxwell? Maxwell actually developed a set of 20 equation describing electromagnetism. Heaviside simplified Maxwell’s original equations to the four that have become famous. So indeed, a collaboration of scientists and engineers can be fruitful, as we are also demonstrating with the OpenXAL project.
ABOUT THE AUTHOR Ivo List learned about programming when in primary school and in high school won a bronze medal at the International Olympiad in Informatics. His first contact with Cosylab was in 2003. He studied, in parallel, Computer Science and Physics, both at the University of Ljubljana. After graduating in 2011, he joined Cosylab full-time. He is currently a PhD candidate in Mathematics working on exact real arithmetic. In his free time Ivo enjoys playing with his dog, Sookie, a mix breed of basset hound and cocker spaniel.
page:
Control Sheet
Volume 25
5
ISSN: 1855-9255
Fast Data Storage System for PAL-XFEL By : Ambrož Bizjak (Cosylab) Cosylab has developed a fast data storage and archiving system for the PAL-XFEL project. The system monitors around 5000 EPICS PVs being updated at a frequency of 60Hz, while keeping, at minimum, 5 minutes of short-term history available for retrieval. On the other hand, the system supports long-term preservation of data from specific time intervals. This includes preservation of both past buffered data as well as future data. For example, the system can be configured to preserve the last 5 minutes and the future 1 minute of data in the event of an interlock, or upon manual request by a user. The archiving service also accepts bulk HTTP data uploads from LLRF devices, which are designed to buffer the last 5 minutes of their data internally, and only start an upload when an interlock is triggered.
Introduction The fast data storage and archiving system for the PAL-XFEL project is based on the EPICS Archiver Appliance [1] software with modifications that implement the data preservation logic and HTTP data uploads, along with many smaller improvements, such as those related to the data storage reliability and general performance. A particularly notable improvement which we have implemented in the EPICS Archiver Appliance is infrastructure that allows implementing custom data sources. [Figure 1]
Fast Short-term Channel Access Archiving The EPICS Archiver Appliance, written in Java, already provides a complete archiving solution designed to scale into millions of PVs. However it wasn’t completely capable of satisfying our specific requirements. We decided that the EPICS Archiver Appliance was still the best way to go, if we only implemented a few additional features. A particularly important feature which we
implemented is the temporary storage of the last few minutes of data in RAM, and moving this data to disk only when necessary, such as when an interlock occurs. This is essential due to the large volumes of data that the archiver would receive from the IOCs. We estimate that the archiver processes around 5MB of data per second (with respect to the storage format). This amounts to more than 400GB of data per day.
How the EPICS Archiver Appliance Receives and Stores Data To help explain the details of our adaptation, we first give a brief explanation of how EPICS Archiver Appliance receives and stores data. The “engine” component of the EPICS Archiver Appliance connects to PVs and receives events using Channel Access subscriptions. These events are generated by the IOCs and include the value of a PV along with other data such as a timestamp
Users
FAST DATA STORAGE SERVER Data Retrieval Short-Term RAM Buffer Conditional Transfer
Hard Drive
Channel Access Live Data
IOC
IOC
...
HTTP Data Uploads
...
IOC LLRF
Figure 1: Overview of the Fast Data Storage system
LLRF
LLRF
and alarm status. The engine inserts the received events into very short internal buffers; there is one buffer for each PV. Periodically (e.g. every 5 s), the engine moves all events which have accumulated in the buffers into the Short Term Store (STS). The STS is the first of possibly a few “lifetimes” of storage, which operate on top of a filesystem using a common data format.
The Storage System of the EPICS Archiver Appliance In a typical installation of the EPICS Archiver Appliance, three storage lifetimes are used: the short-term store (STS), medium-term store (MTS) and long-term store (LTS). But it is possible to use as many as needed. The intention of this design is that the lowest layers (towards the STS) would be used to store more recent data, and the higher layers (towards the LTS) would be used to store older data. The lower layers would typically store less data, but use faster storage technologies, to allow faster retrieval of data. The reasoning is that users are more likely to request recent rather than older data. Each storage lifetime is configured with its own set of parameters. Since data is kept in time-based partitions (files), the most important parameter is the partition granularity. Possible granularities are 5, 15 and 30 minutes, one hour, one day, one month and one year. For example, with 1-day granularity, all data for one day would be kept in its own file. Each PV has its own independent set of data files, which are kept within a directory hierarchy under a configured storage folder. The subdirectories and the name of the data files are determined
page:
Control Sheet
Volume 25 by the PV name and the time span of the data. For example, the data for the PV “X:Y:Z” for Nov. 03 2015 may be stored in the file “X/Y/Z:2015_11_03.pb”, if 1-day partition granularity is used. The “etl” component of the EPICS Archiver Appliance moves data from lower to higher storage lifetimes according to the configured rules. Data transfer is always from the current lifetime to a lifetime that is one higher, e.g. from STS to MTS or from MTS to LTS. By default, a partition in the lower lifetime is transferred to the higher lifetime as soon as it is complete. In this process, the original partition is destroyed, and the data it contains becomes part of a highergranularity partition in the higher storage lifetime. For example, data from a 1-day partition in the STS may be moved into a 1-month partition in the MTS. The system can also be configured to keep up to a fixed number of complete partitions in a particular storage lifetime. So, for example, one can configure the MTS to keep data for the past 10 days; this way, a partition with a day’s worth of data would only be transferred to the LTS after this data becomes older than 10 days.
EPICS Archiver Applicance: Modifications Using the Short Term Store as a Buffer
We used the Short Term Store for the purpose of buffering the last 5 minutes of data. We configured the STS on top of a RAMbased filesystem (tmpfs), with 5-minute partition granularity, and set it to keep at least one whole 5-minute partition. In this configuration, the STS (considering a single PV) will follow a pattern: usually there is one completed partition followed by an incomplete partition being filled with live data; when this partition is completed, another partition is started, but very soon after, the oldest complete partition is moved to the next storage lifetime. This does mean that the STS will in the worst case contain 10 minutes of data, even though only 5 minutes are needed; but this inefficiency was acceptable. Better utilization of the STS may be achieved by using partition granularities shorter than 5 minutes, but this would require additional changes in the EPICS Archiver Appliance. [Figure 2] So we know how to get the Short Term
15:25
15:30
6
ISSN: 1855-9255 15:35
Now
Live Data Contains No data less than 5 minutes old, can be transferred / deleted
Incomplete partition
Contains data less than 5 minutes old
Figure 2: Short-term store partitions
Store to operate as a 5-minute buffer, but what happens to data which is moved out of the STS? Most often, we want such “old” data to be deleted, except when a request has been made for the data to be preserved.The original EPICS Archiver Appliance did not have this functionality, so we had to implement it.
Lifetime Transfer Gating This new feature acts as a gate at the transfer between two storage lifetimes; in our case, this is the transfer from the STS to the MTS. For each partition from the source which is considered for transfer, a decision is made, whether its data would be transferred to the next storage lifetime or simply deleted. This decision is based on special data-preservation requests to the EPICS Archiver Appliance. Such a request specifies the start and end of the time span for which data is to be preserved. The software remembers these requests in a queue of limited size. Additionally, it detects requests for overlapping time spans and merges
them; this ensures that a large number of requests would not cause other still-relevant requests to be pushed out of the limited queue, so long as the time spans of the requests are within certain bounds. The direct way to request preservation of data is to issue a specific HTTP request to the “mgmt” component of the EPICS Archiver Appliance, specifying the precise time span. Then “mgmt” will forward the request to the “etl” components of all servers in the cluster. This indirection is done to support deployments with multiple servers in a cluster, though in our case only a single server is used. [Figure 3] In order to provide a more accessible interface, a separate Java program, ArchiveTrigger, has been developed. This program monitors specific PVs as configured and issues data-preservation (HTTP) requests to the EPICS Archiver Appliance when monitor events are received from these PVs. The time spans of the data-preservation requests are derived from the timestamps
Short Term Store
Partition eligible for removal from STS
No
Partition s time span overlaps with the time span of a Preservation Request
Discard Data
Figure 3: Lifetime transfer gating
Yes
Hard Drive
page:
Volume 25 in the monitor events. Along with each monitored PV, the past- and future-time are configured. The data-preservation requests will be for the time span [T - PastTime, T + FutureTime], where T is the timestamp as included in the event. Our archiving server then hosts a simple PV which is monitored by ArchiveTrigger, so preservation of recent data can be requested simply by putting a value into this PV. ArchiveTrigger can also directly monitor interlock PVs, to preserve data when an interlock occurs.
Control Sheet
ISSN: 1855-9255
Common Engine Code Default CA/pvAccess Archiving Code
EPICS Archiver Appliance Engine Component
Generic Data Source Layer
Data Source API
Data Source Plugins
Bulk Data Transfers In addition to receiving live data using Channel Access, our archiving service must accept data from certain devices (LLRF) which internally buffer their own data, and only send this data to the archiver when it needs to be stored, such as when an interlock occurs. Since we wanted this data to be available in the same manner as data received via Channel Access, we decided to adapt the EPICS Archiver Appliance to accept and store this data. We settled on using HTTP for these bulk data transfers because HTTP enjoys wide support in software, and the HTTP POST requests work well for transfers that are initiated by the sender, as in our case. For example, if the data to be uploaded is stored in a file, a call the “curl” command-line program will upload it to the archiving server. However, we still had to decide on a particular data format. While we considered existing serialization formats such as Google Protocol Buffers and EPICSv4 pvData, in the end a custom format was used. When an interlock occurs, up to 51 LLRF devices will start a data upload. The archiving server streams this data directly into the MTS (to disk). However, having the server process all these streams at the same time would be wasteful of resources (especially memory and CPU), and the high load could interfere with the performance of CA archiving. Therefore, we have designed the software to limit the number of active HTTP uploads to a configured number. Uploads which cannot be processed immediately due to this limit are put on hold. The data format of an upload consists of a header which describes the data, followed by a number of fixed-size “samples”. Based on the information in the header, the PV names corresponding to values in the samples are determined. The data in the sam-
7
LLRF/HTTP Upload Plugin
Possible future plugin(s)
Figure 4: Generic Data Source architecture
ples is located at fixed positions without additional metadata. Each sample contains data with the same timestamp, so the timestamp appears only once in the sample.
Integration and Generic Data Sources Since the EPICS Archiver Appliance is designed to archive live data received over Channel Access or EPICSv4 pvAccess, it was not straightforward to make it work with bulk uploads of our custom format. To make the job easier, we developed a layer between the existing EPICS Archiver Appliance software and our custom code which handles the HTTP transfers. We call this the Generic Data Source Infrastructure because it allows developing custom plugins which feed data to the archiver. Due to an experiment we had done using pvAccess for data transfer, this infrastructure supports both live and bulk transfers. Our LLRF/HTTP transfer implementation is built on top of this infrastructure. [Figure 4]
Performance The system as deployed is capable of working with 5000 scalar CA PVs updating at a rate of 60Hz, without encountering any loss of data. It is able to process an HTTP data upload amounting to 18GB in less than 3 minutes, which ends up consuming about 29GB of disk space. This increase in storage space is due the storage format of the EPICS Archiver Appliance which stores metainformation for each PV, such as the timestamp and alarm status. Additionally, the HTTP upload does not appear to cause any data loss for CA archiving.
Conclusions In addition to implementing the required features in the EPICS Archiver Appliance, we also corrected many bugs and implemented a few smaller features which improved the usability of our system. For example, we implemented corrections that allow the software to recover from an event which leaves incomplete data written to disk, such as running out of disk space or a crash. Previously, manual intervention was required. Our improvements to the EPICS Archiver Appliance software have been published publically [2] and we expect that most will eventually be integrated into the upstream code.
REFERENCES [1] EPICS Archiver Appliance (http://slacmshankar.github.io/epicsarchiver_docs/ index.html) [2] https://github.com/slacmshankar/epicsarchiverap/pull/8
ABOUT THE AUTHOR Ambrož Bizjak graduated with a degree in Computer Science and Mathematics from the Faculty of Computer and Information Science at the University of Ljubljana in 2013. He joined Cosylab in 2014 as a software developer and has primarily worked as a developer in a project involving a large medical device. In his free time, Ambroz likes to work on open-source software projects and with 3D-printers; indeed he develops his own firmware for 3D-printers. Ambroz also likes sports; he frequently goes running or cycling, and trains in Ju-Jitsu.
That’s one tasty Kangaroo...
ICALEPCS 2015 The 15th International Conference on Accelerator and Large Experimental Control Systems (ICALEPCS 2015) conference was held in Melbourne, Australia at the Melbourne Convention and Exhibition Centre (MCEC). It was jointly hosted by the Australian Synchrotron (AS) and Australian Nuclear Science and Technology Organisation (ANSTO). Here are some images captured by the Cosylab ICALEPCS 2015 Team.
Deep tunes provided by a local band
Getting to know some local wildlife
White Rabbit, not just a timing system!
Kudos to Matej and Tom from PSI’s Simon Ebner for their work on EPICS Channel Access in Java. Simon joked that the Cosylab logo was not on the slide because he did not yet have the new T-shirt. This got quickly remedied!! The conference dinner was held at the Melbourne Cricket Grounds. Yes, that’s big all over the empire.
Australian Synchrotron, ably controlled by a marsupial.
Cosylab d.d., Teslova ulica 30, SI-1000 Ljubljana, SLOVENIA Phone: +386 1 477 66 76
Email: controlsheet@cosylab.com
URL: www.cosylab.com
Cosylab d.d., Teslova ulica 30, SI-1000 Ljubljana, SLOVENIA Phone: +386 1 477 66 76
Email: controlsheet@cosylab.com
URL: www.cosylab.com