Top Banner
Exploring Remote Operation for ALMA Observatory Tzu-Chiang Shen a , Ruben Soto a , Nicolás Ovando a , Gaston Velez a , Soledad Fuica a , Anton Schemlr a , Andres Robles a , Jorge Ibsen b , Giorgio Phillipi b , Emmanuel Pietriga c a Atacama Large Millimeter/submillimeter Array, Av. Alonso de Córdova 3107, Santiago, Chile, b European Southern Observatory, Av. Alonso de Córdova 3107, Santiago, Chile, INRIA, Av. Apoquindo 2827, Santiago, Chile. ABSTRACT The Atacama Large Millimeter /submillimeter Array (ALMA) will be a unique research instrument composed of at least 66 reconfigurable high-precision antennas, located at the Chajnantor plain in the Chilean Andes at an elevation of 5000 m. The observatory has another office located in Santiago of Chile, 1600 km from the Chajnantor plain. In the Atacama desert, the wonderful observing conditions imply precarious living conditions and extremely high operation costs: i.e: flight tickets, hospitality, infrastructure, water, electricity, etc. It is clear that a purely remote operational model is impossible, but we believe that a mixture of remote and local operation scheme would be beneficial to the observatory, not only in reducing the cost but also in increasing the observatory overall efficiency. This paper describes the challenges and experience gained in such experimental proof of the concept. The experiment was performed over the existing 100 Mbps bandwidth, which connects both sites through a third party telecommunication infrastructure. During the experiment, all of the existent capacities of the observing software were validated successfully, although room for improvement was clearly detected. Network virtualization, MPLS configuration, L2TPv3 tunneling, NFS adjustment, operational workstations design are part of the experiment. Keywords: ALMA, Remote Operation, NFS, MPLS, L2TPv3 1. INTRODUCTION The Atacama Large Millimeter /submillimeter Array (ALMA) will be a unique research instrument composed of at least 66 reconfigurable high-precision antennas. The observatory is located at north of Chile, 1600 km away from the headquarter office located in Santiago, Chile. In the Atacama desert, the wonderful observing conditions imply precarious living conditions and extremely high operation costs: i.e: flight tickets, hospitality, infrastructure, water, electricity generation, etc. Nowadays, astronomers, technical staffs from several cities of Chile travel systematically to the observatory in order to support the operation activities. As same as the rest of the digital world, in the observatory, not all the tasks require the physical presence of the specialists, especially those related to control of the array: astronomers, software engineers, database administrators, system administrators, network specialists and IT help desks, etc. Having the possibility to control remotely the array will provide flexibility in the current working model. It is clear that a pure remote operational model is impossible yet, but we believe that a mixture of remote and local operation scheme would be beneficial to the observatory, not only in reducing the cost but also in increasing the observatory overall efficiency. Reducing the human presence in the observatory will also minimize the impact in the nature environment of Atacama desert in all aspects. This paper describes the experience gained in such proof of the concept experiment. In the chapter 1, the network infrastructure was shown. In chapter 2, the experiment of the remote operation was described, in which we summarized the problems that were detected, and finally, conclusions are provided in the Chapter 3. 2. NETWORK INFRASTRUCTURE Beside the headquarter office in Santiago city, in the Atacama desert, the observatory features two operation facilities: the Array Operation Site (AOS) located at 5000 m in altitude, and the Operation Support Facility (OSF) which is located at 3000 m in altitude with better living and working conditions. From the network point of view, there are three main network domains in ALMA: AOS, OSF and SCO (the Santiago’s headquarter office). The network connectivity between AOS and OSF is provided by an internal 10 Gbps fiber optic link, meanwhile the connectivity between OSF and SCO is provided by a third party telecommunication company, from which, a 100 Mbps end to end bandwidth is contracted. Observatory Operations: Strategies, Processes, and Systems V, edited by Alison B. Peck, Chris R. Benn, Robert L. Seaman, Proc. of SPIE Vol. 9149, 91492D · © 2014 SPIE CCC code: 0277-786X/14/$18 · doi: 10.1117/12.2054860 Proc. of SPIE Vol. 9149 91492D-1 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/06/2014 Terms of Use: http://spiedl.org/terms
6

Exploring remote operation for ALMA Observatory

Mar 11, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Exploring remote operation for ALMA Observatory

Exploring Remote Operation for ALMA Observatory Tzu-Chiang Shena, Ruben Sotoa, Nicolás Ovandoa, Gaston Veleza, Soledad Fuicaa, Anton Schemlra,

Andres Roblesa, Jorge Ibsenb, Giorgio Phillipib, Emmanuel Pietrigac

aAtacama Large Millimeter/submillimeter Array, Av. Alonso de Córdova 3107, Santiago, Chile, bEuropean Southern Observatory, Av. Alonso de Córdova 3107, Santiago, Chile, INRIA, Av.

Apoquindo 2827, Santiago, Chile.

ABSTRACT

The Atacama Large Millimeter /submillimeter Array (ALMA) will be a unique research instrument composed of at least 66 reconfigurable high-precision antennas, located at the Chajnantor plain in the Chilean Andes at an elevation of 5000 m. The observatory has another office located in Santiago of Chile, 1600 km from the Chajnantor plain. In the Atacama desert, the wonderful observing conditions imply precarious living conditions and extremely high operation costs: i.e: flight tickets, hospitality, infrastructure, water, electricity, etc. It is clear that a purely remote operational model is impossible, but we believe that a mixture of remote and local operation scheme would be beneficial to the observatory, not only in reducing the cost but also in increasing the observatory overall efficiency. This paper describes the challenges and experience gained in such experimental proof of the concept. The experiment was performed over the existing 100 Mbps bandwidth, which connects both sites through a third party telecommunication infrastructure. During the experiment, all of the existent capacities of the observing software were validated successfully, although room for improvement was clearly detected. Network virtualization, MPLS configuration, L2TPv3 tunneling, NFS adjustment, operational workstations design are part of the experiment.

Keywords: ALMA, Remote Operation, NFS, MPLS, L2TPv3

1. INTRODUCTION The Atacama Large Millimeter /submillimeter Array (ALMA) will be a unique research instrument composed of at least 66 reconfigurable high-precision antennas. The observatory is located at north of Chile, 1600 km away from the headquarter office located in Santiago, Chile. In the Atacama desert, the wonderful observing conditions imply precarious living conditions and extremely high operation costs: i.e: flight tickets, hospitality, infrastructure, water, electricity generation, etc.

Nowadays, astronomers, technical staffs from several cities of Chile travel systematically to the observatory in order to support the operation activities. As same as the rest of the digital world, in the observatory, not all the tasks require the physical presence of the specialists, especially those related to control of the array: astronomers, software engineers, database administrators, system administrators, network specialists and IT help desks, etc. Having the possibility to control remotely the array will provide flexibility in the current working model. It is clear that a pure remote operational model is impossible yet, but we believe that a mixture of remote and local operation scheme would be beneficial to the observatory, not only in reducing the cost but also in increasing the observatory overall efficiency. Reducing the human presence in the observatory will also minimize the impact in the nature environment of Atacama desert in all aspects.

This paper describes the experience gained in such proof of the concept experiment. In the chapter 1, the network infrastructure was shown. In chapter 2, the experiment of the remote operation was described, in which we summarized the problems that were detected, and finally, conclusions are provided in the Chapter 3.

2. NETWORK INFRASTRUCTURE Beside the headquarter office in Santiago city, in the Atacama desert, the observatory features two operation facilities: the Array Operation Site (AOS) located at 5000 m in altitude, and the Operation Support Facility (OSF) which is located at 3000 m in altitude with better living and working conditions. From the network point of view, there are three main network domains in ALMA: AOS, OSF and SCO (the Santiago’s headquarter office). The network connectivity between AOS and OSF is provided by an internal 10 Gbps fiber optic link, meanwhile the connectivity between OSF and SCO is provided by a third party telecommunication company, from which, a 100 Mbps end to end bandwidth is contracted.

Observatory Operations: Strategies, Processes, and Systems V, edited by Alison B. Peck, Chris R. Benn, Robert L. Seaman, Proc. of SPIE Vol. 9149, 91492D · © 2014 SPIE

CCC code: 0277-786X/14/$18 · doi: 10.1117/12.2054860

Proc. of SPIE Vol. 9149 91492D-1

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/06/2014 Terms of Use: http://spiedl.org/terms

Page 2: Exploring remote operation for ALMA Observatory

Tunnel L2TPV3

LINK IP MPLSENTEL Network

(100Mbps)

FO

1Gbps

ALM

A O

ptical Path

Ca n ao R

ai Optka b A

IMA

BO

LIVIA

PA

RA

GU

AY

BR

AZ

IL

PA

CIF

IC O

CE

AN

OC

EA

NO

PAC

IfI_O

AR

GE

NT

INA

BR

AZ

IL

AIM

A

In 2012, a Gbps link solution was explored to connect the OSF to the SCO [4], with the main objective to increase the bandwidth and reduce the time dedicated to transfer science observing data out of the observatory and persist them permanently in the datacenter located in SCO and ultimately, transfer the data to the ALMA Regional Centers (ARC) distributed in the world. Having this link available, it also opens the possibility establish the remote operation model.

In the following sections, details of these two links are explained.

2.1 Microwave link

Since the construction phase of the observatory, the connectivity between OSF and SCO provided by ENTEL, one of main telecommunication company in Chile. The IP MPLS link contracted has a bandwidth of 100 Mbps. At SCO, the link is connected to a 1 Gbps fiber optic, while at the OSF, the link is established by a microwave link with 155 Mbps of bandwidth. The company guarantees an end-to-end bandwidth of 100 Mbps.

Figure 1. ALMA fiber optic backbone.

2.2 Fiber optic link

In addition to the 100 Mbps link, ALMA has started with the execution of the project, which will provide a private fiber optic link as part of the communication backbone of ALMA [4]. At the moment when the remote operation tests were performed, the section between OSF and ESO Paranal Observatory (PAO) was not ready yet (see Fig 1), therefore we decided to perform the test between SCO and PAO. Taking into account that the distance between PAO and OSF is marginal in comparison of the whole link, therefore the results obtained should be still representative of the final complete link.

Figure 2. ALMA fiber optic backbone.

Proc. of SPIE Vol. 9149 91492D-2

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/06/2014 Terms of Use: http://spiedl.org/terms

Page 3: Exploring remote operation for ALMA Observatory

E SO-Para n al

Guan30 x.10

5F06lCiw-5M

EIN{R-01.SC1210.202 0.1Ca)562a

V2202

Moot Aloe-.sswUrP

Lloeep-Px0

ESO- Vitacura

a0x1q1006LC-L13Mi PI-CA

©I 10 2020 ES<C0S09f12202

LnLinos-WO

For the purpose of the test, two additional fiber optic links between PAO and SCO was established over the existing EVALSO communication infrastructure [3]. These two fiber optic links will be part of the final links between SCO and OSF. For the purpose of the remote operation test, only one link will be used. Note that these two links is completely isolated to the existent ones which provide connectivity to PAO

Before the actual remote operation tests, connectivity was verified between SCO and PAO by simulating some traffic (see Fig 2) which showed a rough throughput of about 980 Mbps out of the nominal bandwidth of 1 Gbps. The average latency is around 19 ms, which is consistent with the specification provided by EVALSO. Regarding to the measurements of the percentage of packets lost, after several hours of test, by using the simple ping command line utility, there was no single packet declared as lost (0%).

Figure 3. Network setup to run ALMA software in simulation

3. REMOTE OPERATION EXPERIMENT The remote operation is nothing new in the history of ALMA, actually in the observatory the remote control of the array has been in place since the very first antenna has reached to the Chajnantor summit (AOS). The control room and the datacenter is located at the OSF, in where, array commissioning activities and science observation were conducted since 2009 while antennas are sitting in the Chajnantor planes.

Before the final link is available, several validation activities were done in order to demonstrate this concept and to know in advance which modifications need to be introduced in the current ALMA software and IT infrastructure. At 2013, the first experiment was performed over the existing 100 Mbps link. In 2014, as the final fiber optic project achieve a partial milestone, in which SCO was connected to the Paranal observatory, located 400 km away from the OSF. The same experiment was repeated over this partial segment but now with a bandwidth of 1 Gbps and improved latency.

The ALMA Software was developed on top of the ACS, a distributed programming framework based on CORBA [6]. In the runtime, the ALMA software is deployed within hundreds of servers, which are grouped as a single unit under the concept of Standard Test Environment (STE) [1]. Within a STE, class C subnets are defined according to the logical functionalities of servers and workstations. Communications across subnets are routed by the main Cisco 6500, a layer 3 switch. During construction period, in order to established multiples STE to process antennas in parallel, Virtual Routing and Forwarding (VRF) were configured [xxx: support infrastructure]. Each STE and its class C subnets are associated to a specific VRF domain. The subnet interested for the remote operation experiment is the one dedicated to operator consoles. This subnet has to be expanded to the remote office (Santiago) over a third party telecommunication company in which we don't have control. In the production environment, four dedicated workstations (with four screens each) are operated by array operators and astronomers during science observations. Within these workstations, users accounts and some observation software are mounted through NFS from a centralized repository.

During the experiment, the usual science observation activities were reproduced: a) system startup, b) hardware setup, c) run science observations like “simple continuum” test, “delay calibration” test and “interferometry pointing” all in

Proc. of SPIE Vol. 9149 91492D-3

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/06/2014 Terms of Use: http://spiedl.org/terms

Page 4: Exploring remote operation for ALMA Observatory

manual mode. As the observations were being executed, all the observation panels were displayed in order to verify the performance of the remote consoles.

3.1 Over microwave link

The first experiment was conducted in November of 2013. At that time, the 100 mbps link was used. In order to expand the VRF domain to SCO, Layer 2 Tunnel Protocol Version 3 (L2TPv3) was enabled and configured in the border switches. This allows multiprotocol layer 2 communications over IP network and VLAN defined for the operators’ consoles will be visible both at the OSF and SCO.

The production environment with real hardware was used and 3 arrays were created: a) 1 array with 25 antennas and the BL correlator and photonic reference #1, b) 1 array with 2 antennas for “total power” observation and photonic reference #2 and c) 1 array with 6 CM antennas and ACA correlator and photonic reference #4.

During the preparation, it was detected that NFS (version 3) setup in the remote console were prohibitive over the 100 mbps link. The latency was too high (60 ms.) and by simply listing a directory could freeze the workstation and after the time out, kernel panic messages were shown. To cope with this problem, we had to modify the deployment and convert all the NFS based directories into local copies in each console. The higher latency also caused delay of several seconds (sometime could be even minutes) in most of the scripts and panels in order to established CORBA connection to remote components running in the datacenter at the OSF, but once panels are activated, the responsiveness is as usual. During the system startup and the observations, components that require connection to the central database also showed huge degradation in their performance.

3.2 Over fiber optic link

In the first quarter of 2014, the previous test was repeated. Contrary to the previous test, this time we cannot reach to the production environment and control the real hardware at the OSF. Therefore it was decided to simulate the array and its associated infrastructure at PAO and the consoles located at SCO will remain as the same as the previous experiment. In preparation, several computing infrastructure must be installed in the datacenter at PAO, specifically: a testing ALMA Archive and simulated STE [1].

A minimalistic version of the ALMA archive was created for this occasion, which is compound of a single node for the Oracle database and an additional node for storing binary data named NGAS [7]. Two set of this version of archive were configured, and one is deployed at the datacenter of PAO and another one is installed in the datacenter of SCO. Database replication was enabled as the same as in the production environment. Many data rate tests were performed by using this infrastructure together with the dedicated fiber link. These tests were oriented to determine the performance of the link by transferring big amount of data; response time for data replication was also measured. Both tests showed good results however we will not enter into the details since it is out of the scope of this paper.

In the other hand, a simulated STE was prepared. Considering the STE had to be installed in a remote location, in order to simplify the transportation and deployments in PAO’s datacenter, Servers were virtualized inside of a physical machine as explained at [1]. Thus, 7 servers were virtualized and configured to connect to the testing archive. ALMA software and hardware simulation components were deployed on the servers, as showed in figure 4, by respecting the same deployment and configuration used in production: one virtual machine, gas04 was dedicated to simulate 6 antennas. Another two virtual machines are dedicated to simulate the correlator.

As part of the simulation STE preparation, before shipping the machine to PAO, some preliminary tests were done with the STE located in SCO and the archive in PAO. A interested behavior was observed: the system startup was extremely slow, it took about 50 minutes while in operation the startup takes around 10 minute. This behavior showed that the huge amount of queries to the database (this time located remotely) that ALMA Common Software (ACS) requires during the system startup is prohibitive. This effect was not observed when the STE was moved to the datacenter of PAO.

Proc. of SPIE Vol. 9149 91492D-4

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/06/2014 Terms of Use: http://spiedl.org/terms

Page 5: Exploring remote operation for ALMA Observatory

AOS- PAR -STEPowerEdge R720xd

GNS

Network ConfigurationUser Account ServerVMWare

Software Components:Control, Exec, Obops,Obsprep, Scheduling,SimulatorVMWare

GAS02

®Software Components:Archivel,TelCalVMWare

GAS03

GASO4

GAS07

Software Components:DMC, LMCVMWare

Software Components:DVO1, DV02, DV03,DV04, DV05, DV06, DA41,PM03VMWare

Software Components:CCCVMWare

Software Components:CDPMaster, CDPNodes[1 -4]VMWare

AOS- ARCH -PAR

Oracle Node

SuperMicro

DatabaseQueries

Binary FilesStore /Retrieve

NGAS Node

SuperMicro

Figure 4. Virtualized STE to simulate the hardware of ALMA array

Once the STE was installed in PAO, the tests were focused on the overall system performance while science observations were reproduced by executing manual scripts and automatic observing blocks. The following table summarizes the tests performed by considering an array of 4 simulated antennas:

Table 3. High level performance results of the simulated STE.

Test case Execution time

Control panel startup 1 [min]

System startup 10 [min]

Array creation 1 [min]

Interferometry observation 3 [min]

Single dish observation 3 [min]

Automatic scheduling block 3 [min]

Raw data retrieval 1 [min]

This time, the aforementioned results are very similar to the one we used to have in the production environment. Other aspects related to the performance of the GUIs were also measured, however the problems detected during the tests with the microwave link could not be reproduced in this opportunity. We believe that the higher packages latency was the root cause of such of problem.

4. CONCLUSION We presented the experience of controlling the ALMA array from a remote control room, 1600 km away from the observatory. Two experiments were conducted: one over the existent 100 Mbps link and controlling the real array, and another over the partial link of 1 Gbps but controlling a simulated array. At the time of the writing of this paper, only preliminary and high level tests were performed, more systematic and exhaustive tests will be executed in the near future. Once the link SCO to OSF is ready, the same experiment will be repeated, but this time by controlling the real array.

We have demonstrated that ALMA software, as it is in 2014 (Release 10.6), is ready for the remote operation and very few adjustments have to be introduced. The most critical technical aspect is the centralized file system, which is shared across multiples servers via NFS. This should be avoided in the remote site; instead, different mechanism to

Proc. of SPIE Vol. 9149 91492D-5

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/06/2014 Terms of Use: http://spiedl.org/terms

Page 6: Exploring remote operation for ALMA Observatory

synchronized file system should be explored. Processes, which access to database intensively such as ACS, SCHEDULING subsystems must be configured to use the local copy of the database (the archive at the OSF is replicated in the datacenter of SCO). Mechanism to introduce such flexibility in configuration should be implemented in the future releases of ALMA software.

The experiment conducted over the 1 Gbps link has demonstrated not only a better available bandwidth but also improved in the round trip delay or packages latency, which at the end it's more relevant from the application responsiveness point of view. During these experiments we favored the design in which remote processes communicate among them and graphical displays present status from local processes instead of concentrating processes only in one site and export the display to the remote site. This approach allows a more efficient usage of bandwidth and responsiveness of the graphical user displays.

Having two control rooms to be able to control the same array could be problematic if both sites were not well coordinated. Therefore we recommend having a permanent audio and video communication in both sides, and a new protocol of operation should be established in order to minimize any chance of conflict access to the array. During the experiments, we observed a very positive effect in the staff which works in Santiago and has had very few opportunities to get involved in the day-to-day operation of the array (procurement, logistics, administrative, IT staffs, etc.), in somehow we felt a better awareness of their direct/indirect contributions to the observatory. We believe that such of feeling will contribute to create a even more cohesive working environment in the observatory.

The remote operation will also create an opportunity to the organization in term of saving in the operational costs, and more importantly, it will also reduce the overall ecological footprint of the project. The real impact in this non-technical aspect is out of the scope of this experiment and an accurate study should be performed in the near future.

5. ACKNOWLEDGEMENT The remote operation experiment was done within the context of our existing collaboration with INRIA and it is the result of a very fruitful joint effort between Computing and Science Operations teams. Tzu-Chiang Shen, Ruben Soto, Soledad Fuica, Anton Schemlr, Gaston Velez, Nicolas Ovando, Andres Robles, Emilio Barrios, Denis Barkats, and Antonio Hales composed the core test team. We also want to thank to Paranal observatory, especially to Ismo Kastinen, Marcus Pavez for their collaboration.

REFERENCES

[1] Shen T., Soto R. and others, "ALMA operation support software and infrastructure", Proceeding of SPIE (2012).

[2] Shen T., Ovando, N., et all, “Virtualization in network and servers infrastructure to support dynamic system reconfiguration in ALMA", Proceedings of SPIE (2012)

[3] Soto, R., Shen, T., Mora, M., Others, “ALMA Software Regression Tests: The Evolution under an Operational Environment”, Proceedings of SPIE, (2012)

[4] Filippi G., Jaque S., Ibsen J. and others, "ALMA communication backbone in Chile goes optical", Proceeding of SPIE (2014)

[5] Filippi G., Jaque S. and others, "EVALSO: a high-bandwidth communication infrastructure to efficiently connect the ESO Paranal and the Cerro Armazones Observatories to Europe", Proceeding of SPIE (2010)

[6] Schwarz, J., Farris, A., Sommer, H., “The alma software architecture”, Proceedings of SPIE, 5496, p. 190 (2004).

[7] J. Knudstrup, “Next Generation Archive System”, http://www.eso.org/projects/dfs/dfs-shared/web/ngas/, 2003.

Proc. of SPIE Vol. 9149 91492D-6

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/06/2014 Terms of Use: http://spiedl.org/terms