[virt-tools-list] libvirt profiles (a.k.a. virtuned) design ideas draft

Tue Jul 17 10:11:31 UTC 2018

On Mon, Jul 09, 2018 at 06:10:28PM +0100, Daniel P. Berrangé wrote:
>On Mon, Jul 09, 2018 at 05:01:25PM +0300, Martin Kletzander wrote:
>> On Thu, Jul 05, 2018 at 05:58:46PM +0100, Daniel P. Berrangé wrote:
>> > On Tue, Jul 03, 2018 at 04:41:52PM -0400, Cole Robinson wrote:
>> > > > ## Brief specification of functionality
>> > > >
>> > > > Currently virtuned aims to provide a consistent way of applying profiles to
>> > > > libvirt VM definitions.  That way management applications don't need to
>> > > > duplicate the implementation in their codebases.
>> > > >
>> > > > ### Functions
>> > > >
>> > > > As a starting point virtuned exposes one function.  As input the function
>> > > > accepts a VM definition with the only restriction being that it is a libvirt
>> > > > domain XML.  However it doesn't have to be complete.  The function applies all
>> > > > relevant profiles to that XML and produces a complete libvirt domain XML.
>> > > >
>> > > > The outcome of this is twofold:
>> > > > - Every libvirt domain XML is already working virtuned XML.
>> > > > - Applications can select, by arbitrarily small steps, how much functionality
>> > > >   they want to use from virtuned.
>> >
>> > I'm not sure I understand this second point. IIUC, the contents of the profiles
>> > are supposed to be opaque to the mgmt application. So while they use virtuned,
>> > they'll be exposed to whatever arbitrary XML the profile contains, whether
>> > they understand it or not.
>> >
>>
>> Why would they need to be opaque to the mgmt app?  Either you are using some of
>> profiles that are shipped with it (in which case the mgmt app developers should
>> know what they are using in the code) or the mgmt app can construct their own
>> profile to be used in which case it should know what it is asking for.
>
>In previous discussions on this topic is was suggested that the selling point
>for profiles was to allow new features to be enabled in multiple mgmt apps
>without having to add support to each mgmt app to format XML, potentially with
>the end user providing arbitrary profiles. This implies that the mgmt app
>considers the profile contents to be opaque. Based on your answer though, it
>seems this is not in fact a goal.
>

It is not.  The first idea was basically just creating basic XML from simple
bits and pieces.  It got out of hand when people started suggesting features up
to the point in which it got ridiculous IMHO.

>Not allowing arbitrary black-box profiles would indeed be my preference,
>since I don't think it is practical to support it in the real world given
>the complex interactions that will fall out of that.
>

We definitely need the mgmt app to understand either the profile or the result
of the profile being used (the output XML).  And it needs to happen before the
VM is being scheduled because there might be things that the scheduler might
need to handle.

>> > > > ### API endpoints ###
>> > > >
>> > > > For now the API will be exposed as:
>> > > >
>> > > > 1. Python module - trivial if we're basing it on virt-manager codebase which is
>> > > >    using python
>> >
>> > What's the key reasons/benefit to be part of virt-manager codebase as opposed
>> > to a standalone project ?
>> >
>>
>> Few things:
>>
>> 1) The XMLBuilder makes it easier to work with the XML, particularly the domain
>>    XML.  This is not that big of a deal since libvirt-go-xml does a good job of
>>    that as well
>>
>> 2) There is an existing logic for "intermediate" devices.  By that I mean the
>>    devices that are needed to add the requested one.  For example when
>>    requesting an addition of a SATA disk, there is already a logic that figures
>>    out if there is an existing SATA controller with a free slot and adds one if
>>    there is not.  The reason for this is that there might be some defaults
>>    specified which affect the intermediate devices.
>>
>> 3) The possibility of exposing virt-xml and virt-install in the future.  The
>>    former would be used for making changes to the XML and the latter is
>>    something that stateless mgmt apps would like to use (cockpit currently).
>
>FWIW, great as virt-install is, if I was writing a new mgmt application, I'd
>really use GNOME Boxes installer as the benchmark. Most importantly it is
>able to fully automate the installation process from installer media, by
>generating the requisite kickstart files from data provided by libosinfo.
>

The installation process does not concern me.  It is simply the XML creation
process that it makes it easier.  And for an MVP it is easier to get up to speed
for me with Python.  But it's not a hard requirement, I don't care what the
language is as long as it doesn't add more time than what the benefit is.

Lot of this would go away if there was a way to make libvirt process the VM
definition with only the necessary changes.  It would also help with the
addresses being figured out without running full-blown libvirt daemon.  Maybe it
will be easier once the split is done?  I don't know.

But thanks for the gnome boxes idea, I'll forward that to the Cockpit dev so
that they can consider it.

>> > > > The above example will request a video card with model QXL to exist in the VM
>> > > > definition.  The precise outcome of this depends on the existing devices in the
>> > > > VM definition:
>> > > >
>> > > > - **VM has no video device:** the XML snippet (`qxl` video card) will simply be
>> > > >   added to the list of devices.
>> > > > - **VM has video device with no model specified:** Just fill in the video model
>> > > >   for the existing video card.
>> > > > - **VM has video device with different model:** Add one more video device with
>> > > >   the specified model since multiple video cards are perfectly fine.
>> > > >
>> > > > The above is very concrete example, but it can be very easily and efficiently
>> > > > generalized for any `<add/>` sub-element.  The only information which is
>> > > > required for said generalization is the knowledge of libvirt's domain XML
>> > > > format.  This could be one of the reasons for virtuned to be spun off of
>> > > > virt-manager's codebase (since most of that information is already there).  The
>> > > > other option would be using
>> > > > [libvirt-go-xml](https://libvirt.org/git/?p=libvirt-go-xml.git) as that should
>> > > > have enough information for this as well <sup id='fn3'>[[3]](#fn3d)</sup>.
>> >
>> > FYI, libvirt-go-xml should have 100% coverage of all XML constructs in the
>> > libvirt schema. Any ommissions are entirely due to libvirt's own master XML
>> > test files being incomplete. libvirt-go-xml unit tests check that it can
>> > roundtrip all XML files in libvirt.git without data loss. I don't think any
>> > other XML parser impl for libvirt has the same level of coverage, principally
>> > because none of them do similar kind of testing to prove it.
>> >
>>
>> Coverage is one thing, but another thing is the logic that is in XMLBuilder
>> (even though it's not there for all the elements).  For example if there are
>> different sub-elements allowed based on an attribute.  But even simpler,
>> elements that cannot be duplicated, but in the struct it is saved in a list.  If
>> that is not fully introspectable from the struct tags, then we will need to
>> duplicate the code that already exists in virt-manager if this is a side
>> project.
>
>The way I've modelled things in Go is that when there is a type=XXXX attribute
>that controls which sub-elements are permitted, I've created dedicated structs
>for each sub-schema. In fact you never set any 'type' attribute - we generate
>the type attribute based on which struct you've created for the child content.
>

Oh, then I didn't look enough at the implementation.  If that is guaranteed,
then it helps a lot.

> > Solving these problems would require a combinatorial expansion in the
>> > number of profiles. eg a numa-pc, numa-q35 profile, and then a
>> > networking-nfv-pc, networking-nfv-q46, networking-nfv-numa-pc, and
>> > networking-nfv-numa-q35 profiles. There would then have to be dependancies
>> > expressed to tell the app which profiles can be composed with each other.
>> >
>>
>> So this is how tuned does it and I didn't really like the way the matrix
>> explodes with added dimensions.
>
>At least with tuned I think the range of profiles is probably fairly
>small, since there's only so many tunables that are going to be
>relevant. With the domain XML, our schema is huge, so I could easily
>imagine getting into high double-figures number of profiles. So this
>will explode the matrix way worse than seen with tuned.
>
>> > This still only solves the problem of composing profiles, and does not
>> > consider how to merge with the application defined XML parts. The only
>> > way an application can know if the XML it wants to write, is compatible
>> > with the profiles it has used, is if it parses and understands all the
>> > parts of the profile.
>> >
>>
>> I hear what you are saying, but I don't see why the app would need to parse the
>> profiles.  There can be conditions in profiles (proposed in open questions) that
>> would eliminated the need for multiple profiles for the same thing.  Yes, DSL
>> would be better for this.  We could just right away use what "xq" provides (see
>> open questions).  That would also solve erroring out.
>
>My point touches slightly in the possible misunderstanding I mention above
>about the scope wrt allowing end user blackbox profiles to be provided.
>
>>
>> > If something was used in the profile that the app doesn't know about,
>> > it could ignore it, but the resulting VM config may well be unrunnable,
>> > or worse, runnable but doing something completely inappropriate.
>> >
>> >
>> > I think these kind of problems are inherant in any approach which allows
>> > arbitrary user defined XML as the schema for the profiles.
>> >
>> > This is one of reasons why libosinfo didn't base the information it
>> > provides around the libvirt XML schema. Instead it defines its own
>> > domain specific language, and applications only use the features in
>> > it that they actually know how to handle.
>> >
>> > This means if we add some new concept to libosinfo database, applications
>> > are not going to automagically use it, and instead have to add explicit
>> > support. As above though, I think this is inevitable, because it is too
>> > easy to create unrunnable/nonsensical XML configs if you allow arbitrary
>> > user specified XML inputs.
>> >
>>
>> Thanks for the info with the NUMA locality example.  On one hand it would really
>> save us a lot of work if we just used something that exists (by just extending
>> it) and for DSL there is a solution we can use as well.  If not then we can
>> build it from existing parts at least partially.
>
>BTW, I meant to include this link to illustate the NUMA locality example:
>
>  https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/
>

I'll have a look at that.  We had some talk about the details and it looks like
targetting such advanced features is not the goal now.  We need to keep that in
mind, of course, but we need to start small as there are still numerous
misunderstandings about what we're going to do.

>> > > I didn't really know where to cut in so this is a big comment...
>> > >
>> > > The idea here is that virtuned will ship with something like a
>> > > profile/add-qxl.xml, and profile=add-qxl will then effectively be part
>> > > of the virtuned API, like an osinfo ID value is to libosinfo; the
>> > > profile will never go away, so apps can depend on it being there.
>> > > Presumably we can extend the profile as necessary as long as it
>> > > accomplishes its stated goal and we confirm it doesn't break apps.
>> > >
>>
>> Yes, we're probably going to need to version it as well.
>
>Hmm, yes, versioning would be key for being able to reconstruct the
>exact same machine each time, even after upgrades. That said, it would
>be valid to declare that profiles need to be persisted at time of VM
>creation, per VM. This is how openstack deals with its "flavour"
>concept - at time of VM create we copy the data for the flavour, so
>we always used the original values for life of that specific VM.
>

So the problem in KubeVirt is that the changes need to be done either
immediately after posting the VM definition (without libvirtd running at all) or
it needs to be reproducible.  Versioning will also help us to be able to change
virtually anything in the future.

>> > > Using XML for this kind of thing makes me nervous, trying to model
>> > > conditional actions with XML. I feel like it's a real quick slippery
>> > > slope to implementing a turing complete schema. For example how would we
>> > > handle complex examples like:
>> > >
>>
>> The idea to use XML was sparkled by two facts:
>>
>> 1) Apps will be able to create their own profiles.
>>
>> 2) Simple profiles (addition of few elements) could be created by just taking
>>    the specific part of the domain XML and wrapping it in a tag that says what
>>    to do (e.g. `<add><existing_xml_snippet/></add>`).
>
>FWIW, I'm not opposed to using XML - I think it is valuable to be able
>to use standardized tools for parsing / formatting / editor syntax
>highligting etc. I'm just wary about using the Domain XML schema itself,
>as opposed to a custom XML schema explicitly designed for this job. If
>nothing else, we've got lots of stupid mistakes in our domain XML schema,
>such as the way we litter CPU/NUMA related bits across 6 different places
>in the schema, making it hard to understand wtf we're expressing.
>

What I was afraid about was creating Yet Another VM Definition Format.  Slightly
unrelated, but I always get reminded of this: https://xkcd.com/927/

>> > > What's the motivation for doing this in XML? So apps or distros can drop
>> > > in their own profiles? Or extend system profiles? I'm wondering why XML
>> > > over privately implemented. Maybe you can explain some specific app
>> > > usecases that motivated this? I feel like I missed a lot in the previous
>> > > discussion
>> > >
>>
>> You didn't miss much and you hit the two points nicely, dropping in own profiles
>> and, possibly, extend existing ones.
>>
>> > > Also do we expect the API to talk directly to libvirt? Like for checking
>> > > domcapabilities?
>> >
>>
>> For KubeVirt that wouldn't be that much of a help as they need to do bunch of
>> these things without libvirt running.  Also not being dependent on libvirt makes
>> it independent from the host.  Capabilities might be provided as another input,
>> but question is whether it should be full blown libvirt (dom)capabilities.  The
>> reason is that you might need to migrate between various nodes and the mgmt
>> app/cluster knows the minimal requirements better than host-oriented daemon.
>
>I don't think it is so clearcut for KubeVirt. It is entirely possible for them
>to have a libvirtd spawned to be able to query the capabilities, independantly
>of them launching the guest if this is a compelling benefit. It dalso depends
>on exactly where in their code flow they'll slot in the usage and expansion
>of profiles into full domain XML.
>

So some of the things are scheduler-related, so it needs to be done before the
cluster is trying to figure out where the VM is going to be scheduled (based on
the amount of RAM, some devices, whatever else).  Others need to be done after
the Pod is created (for example vCPU pinning based on what vCPUs the Pod will
get allocated).

>> > I tend to think writing the profiles is going to be more complex and
>> > error prone than directly writing the XML, because of the composability
>> > problems I mention above.
>> >
>> > My gut feeling is that it would be a more tractable problem if the profiles
>> > used a domain specific language (DSL), possibly still XML, but not libvirt
>> > domain XML. Applications would have to explicitly know about individual
>> > features in the DSL, but they could consume it in a way that the way they
>> > generate libvirt XML is more fully data-driven.
>> >
>> > ie, taking my example above, applications would need explicit knowledge
>> > of machine types, NUMA topologies, and attaching devices to NUMA nodes.
>> > Given that knowledge though, the decision about /when/ to use these
>> > respective features would be data driven from profiles that simply
>> > stated desired traits.
>> >
>>
>> I lost you at the last paragraph.  Could you rephrase it or maybe give another
>> example?  The idea is that mgmt app knows when it wants to use what profile.
>> And what is provided as an API is the composition of the XML.  But you were
>> probably addressing something else, right?  As I said, I lost you here.
>
>This does back to the question of scope wrt whether profiles are blackboxes
>that administrators can augment at will, or whether it is strictly limited
>to stuff the application developer has decided to express. If it is the
>latter, then it simplifies the process of expanding the profile to form
>domain XML.
>
>To be clear though, my thought was that if you have a DSL, you could say
>
>  "Place guest on host node 0"
>
>in the profile, and the application would have logic to turn that into
>the domain XML that sets appropriate NUMA tunables in the various different
>places, giving the application to customize them taking into account other
>factors. For example, the app might have been told not to use host CPUs
>0 and 1, as they're reserved for OS processes. It can use that knowledge
>to filter out pinning to CPUs 0 and 1, and only pin to CPUs 2-3 in node.
>
>If the profile is expressed in terms of domain XML, then the profile would
>be encoding specific host CPU information, and the application would have
>to parse the domain XML and modify all the places which list CPUs to
>remove CPUs 0 and 1. So in that sense having the profile use domain XML
>isn't really simplifying life for the app - it would have been easier to
>just generate the domain XML from scratch rather than parse & modify
>what was written in the profile to remove 2 CPUs.
>

Good point.  Thanks for the ideas.  I'll keep them in mind although, as I said,
it's not very defined what we're trying to achieve so I'm trying to frame that
at the same time.

Have a nice day,
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/virt-tools-list/attachments/20180717/b12b0d97/attachment.sig>